nep-big New Economics Papers
on Big Data
Issue of 2019‒04‒01
twenty papers chosen by
Tom Coupé
University of Canterbury

  1. Machine Learning Methods Economists Should Know About By Susan Athey; Guido Imbens
  2. When are Google data useful to nowcast GDP? An approach via pre-selection and shrinkage By Laurent Ferrara; Anna Simoni
  3. Digitalization of manufacturing process and open innovation: Survey results of small and medium sized firms in Japan By MOTOHASHI Kazuyuki
  4. Asymptotic Expansion as Prior Knowledge in Deep Learning Method for high dimensional BSDEs (Forthcoming in Asia-Pacific Financial Markets) By Masaaki Fujii; Akihiko Takahashi; Masayuki Takahashi
  5. Use and sharing of big data, firm networks and their performance By KIM YoungGak; MOTOHASHI Kazuyuki
  6. El efecto globo: Identificación de regiones propensas a la producción de coca By Juan Sebastian Moreno
  7. Let There Be Light: Trade and the Development of Border Regions By Marius Brülhart; Olivier Cadot; Alexander Himbert
  8. Let There Be Light: Trade and the Development of Border Regions By Marius BRÜLHART; Olivier CADOT; Alexander HIMBERT
  9. Let There Be Light: Trade and the Development of Border Regions By Marius BRÜLHART; Olivier CADOT; Alexander HIMBERT
  10. Migration and the Value of Social Networks By Blumenstock, Joshua; Chi, Guanghua; Tan, Xu
  11. The Wrong Kind of AI? Artificial Intelligence and the Future of Labor Demand By Daron Acemoglu; Pascual Restrepo
  12. Machine Learning for Pricing American Options in High Dimension By Ludovic Gouden\`ege; Andrea Molent; Antonino Zanette
  13. Use of new information technology such as AI and worker well-being: Evidence from panel data analysis (Japanese) By YAMAMOTO Isamu; KURODA Sachiko
  14. Towards hybrid price discrimination via neighbours properties in network-driven economy By Jacopo Arpetti; Antonio Iovanella
  15. Bayesian MIDAS penalized regressions: estimation, selection, and prediction By Matteo Mogliani
  16. Stacked Monte Carlo for option pricing By Antoine Jacquier; Emma R. Malone; Mugad Oumgari
  17. A Machine Learning approach to Risk Minimisation in Electricity Markets with Coregionalized Sparse Gaussian Processes By Daniel Poh; Stephen Roberts; Martin Tegn\'er
  18. Fake News and Propaganda: Trump's Democratic America and Hitler's National Socialist (Nazi) Germany By Allen, D.E.; McAleer, M.J.
  19. Early Life Exposure to Pollution: Eect of Seasonal Open Biomass Burning on Child Health in India By Singh, Prachi; Dey, Sagnik; Chowdhury, Sourangsu
  20. Cities, Lights, and Skills in Developing Economies By Jonathan I. Dingel; Antonio Miscio; Donald R. Davis

  1. By: Susan Athey; Guido Imbens
    Abstract: We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.10075&r=all
  2. By: Laurent Ferrara (Banque de France); Anna Simoni (CREST; CNRS.)
    Abstract: Nowcasting GDP growth is extremely useful for policy-makers to assess macroe-conomic conditions in real-time. In this paper, we aim at nowcasting euro area GDP with a large database of Google search data. Our objective is to check whether this speci?c type of information can be useful to increase GDP nowcasting accuracy, and when, once we control for o?cial variables. In this respect, we estimate shrunk bridge regressions that integrate Google data optimally screened through a targeting method, and we empirically show that this approach provides some gain in pseudo-real-time nowcasting of euro area GDP quarterly growth. Especially, we get that Google data bring useful information for GDP nowcasting for the four ?rst weeks of the quarter when macroeconomic information is lacking. However, as soon as o?cial data become available, their relative nowcasting power vanishes. In addition, a true real-time anal-ysis con?rms that Google data constitute a reliable alternative when o?cial data are lacking.
    Keywords: Nowcasting, Big data, Google search data, Sure Independence Screening, Ridge Regularization.
    Date: 2019–02–21
    URL: http://d.repec.org/n?u=RePEc:crs:wpaper:2019-04&r=all
  3. By: MOTOHASHI Kazuyuki
    Abstract: Digitalization has a transformative impact on innovation in firms and industry. In this paper, the results of the Survey on the Changing Nature of Manufacturing Processes and New Product Development are presented to show how the nature of Japanese SMEs in manufacturing industry is changing in the new IT era (AI, big data and IoT). It is found that a firm applying new IT, such as data analytics by machine learning, is likely to be involved in delivering digital services as well as new products (servitalization) and innovation ecosystem, interacting with multiple firms. Such firms address wider customer needs, instead of just meeting existing customer requirements, meaning that its product innovation is likely to happen in new business fields. In addition, a firm which extensively uses its customer data gains more sales and profit contributions from its new product.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:eti:polidp:19005&r=all
  4. By: Masaaki Fujii (Quantitative Finance Course, Graduate School of Economics, The University of Tokyo); Akihiko Takahashi (Quantitative Finance Course, Graduate School of Economics, The University of Tokyo); Masayuki Takahashi (Quantitative Finance Course, Graduate School of Economics, The University of Tokyo)
    Abstract: We demonstrate that the use of asymptotic expansion as prior knowledge in the "deep BSDE solver", which is a deep learning method for high dimensional BSDEs proposed by Weinan E, Han & Jentzen (2017), drastically reduces the loss function and accelerates the speed of convergence. We illustrate the technique and its implications by using Bergman's model with different lending and borrowing rates as a typical model for FVA as well as a class of solvable BSDEs with quadratic growth drivers. We also present an extension of the deep BSDE solver for reflected BSDEs representing American option prices.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:cfi:fseres:cf456&r=all
  5. By: KIM YoungGak; MOTOHASHI Kazuyuki
    Abstract: RIETI conducted the Survey of Big Data Use and Innovation in Japanese Manufacturing Firms in 2015. This paper uses this survey data, linked with TSR data of inter-firm transactions, to examine the relationship between supplier and customer (business partner) network structures and the data sharing with these business partners. It is found that, in general, the number of suppliers is positively correlated with the likelihood of internal use of data and data sharing with suppliers, customers, and other third-party firms. On the contrary, the number of customers is negatively correlated with data use and sharing, especially with customers. The analysis results also show that long-term relationships with suppliers contribute negatively to data sharing, but contribute positively to data sharing with customers. Interestingly, the more customers a firm's suppliers have, or the more suppliers a firm's customers have in their transaction networks, the less likely it is that the firm shares big data with other third-party firms. We find that data sharing has a positive and significant impact on firm productivity. However, we find no positive contribution of data sharing to attracting new customers or suppliers. We do not find any significant effect of data sharing on the extensive margin of transactions.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:eti:dpaper:19016&r=all
  6. By: Juan Sebastian Moreno
    Abstract: La lucha contra las drogas, especialmente contra la cocaína, ha empleado una gran porción de los recursos económicos, ambientales y humanos de Colombia en las últimas cuatro décadas. No obstante, no parece que haya avances significativos en la reducción del tráfico de cocaína ni en la reducción de cultivos de coca. Una de las principales razones por la que las estrategias que atacan la oferta de narcotráfico han fracasado es el llamado Efecto Globo, según el cual represiones en la producción de drogas de una región están asociados a aumentos en otras regiones por efectos de desplazamiento. Con el objetivo de reducir de manera definitiva los cultivos de coca en Colombia durante la implementación de los acuerdos de paz, se busca identificar los municipios susceptibles a las consecuencias del Efecto Globo, es decir, encontrar los municipios que son potencialmente cocaleros a través de un ejercicio de predicción. Este ejercicio permitirá alertar al gobierno para que implemente políticas focalizadas que eviten el desarrollo de producción de hoja de coca. La metodología empírica se centra en técnicas de análisis supervisado de aprendizaje de máquinas (machine learning), en particular ensambles de modelos a través de subbagging, los cuales permiten el desarrollo de un modelo de predicción agregado que pueda clasificar los municipios potencialmente cocaleros.
    Keywords: Efecto Globo, cultivos ilícitos, hoja de coca, machine learning, subbagging
    JEL: B49 K42 P37
    Date: 2018–10–29
    URL: http://d.repec.org/n?u=RePEc:col:000508:017216&r=all
  7. By: Marius Brülhart (HEC Lausanne - Faculté des Hautes Etudes Commerciales (HEC Lausanne)); Olivier Cadot (FERDI - Fondation pour les Etudes et Recherches sur le Développement International, UNIL - Université de Lausanne); Alexander Himbert (UNIL - Université de Lausanne)
    Abstract: Does international trade help or hinder the economic development of border regions relative to interior regions? Theory tends to suggest that trade helps, but it can also predict the reverse. The question is policy relevant as regions near land borders are generally poorer, and sometimes more prone to civil conflict, than interior regions. We therefore estimate how changes in bilateral trade volumes affect economic activity along roads running inland from international borders, using satellite night-light measurements for 2,186 border-crossing roads in 138 countries. We observe a significant 'border shadow': on average, lights are 37 percent dimmer at the border than 200 kilometers inland. We find this difference to be reduced by trade expansion as measured by exports and instrumented with tariffs on the opposite side of the border. At the mean, a doubling of exports to a particular neighbor country reduces the gradient of light from the border by some 23 percent. This qualitative finding applies to developed and developing countries, and to rural and urban border regions. Proximity to cities on either side of the border amplifies the effects of trade. We provide evidence that local export-oriented production is a significant mechanism behind the observed effects.
    Keywords: border regions,economic geography,night lights data,Trade liberalization
    Date: 2019–03–18
    URL: http://d.repec.org/n?u=RePEc:hal:wpaper:hal-02071819&r=all
  8. By: Marius BRÜLHART (FERDI); Olivier CADOT (Faculté des hautes études commerciales - Université de Lausanne); Alexander HIMBERT (University of Lausanne)
    Abstract: Does international trade help or hinder the economic development of border regions relative to interior regions? Theory tends to suggest that trade helps, but it can also predict the reverse. The question is policy relevant as regions near land borders are generally poorer, and sometimes more prone to civil conflict, than interior regions. We therefore estimate how changes in bilateral trade volumes affect economic activity along roads running inland from international borders, using satellite night-light measurements for 2,186 border-crossing roads in 138 countries. We observe a significant ‘border shadow’: on average, lights are 37 percent dimmer at the border than 200 kilometers inland. We find this difference to be reduced by trade expansion as measured by exports and instrumented with tariffs on the opposite side of the border. At the mean, a doubling of exports to a particular neighbor country reduces the gradient of light from the border by some 23 percent. This qualitative finding applies to developed and developing countries, and to rural and urban border regions. Proximity to cities on either side of the border amplifies the effects of trade. We provide evidence that local export-oriented production is a significant mechanism behind the observed effects.
    Keywords: Trade liberalization, border regions, economic geography, night lights data
    JEL: F14 F15 R11 R12
    Date: 2019–02
    URL: http://d.repec.org/n?u=RePEc:fdi:wpaper:4784&r=all
  9. By: Marius BRÜLHART (FERDI); Olivier CADOT (Faculté des hautes études commerciales - Université de Lausanne); Alexander HIMBERT (University of Lausanne)
    Abstract: Does international trade help or hinder the economic development of border regions relative to interior regions? Theory tends to suggest that trade helps, but it can also predict the reverse. The question is policy relevant as regions near land borders are generally poorer, and sometimes more prone to civil conflict, than interior regions. We therefore estimate how changes in bilateral trade volumes affect economic activity along roads running inland from international borders, using satellite night-light measurements for 2,186 border-crossing roads in 138 countries. We observe a significant ‘border shadow’: on average, lights are 37 percent dimmer at the border than 200 kilometers inland. We find this difference to be reduced by trade expansion as measured by exports and instrumented with tariffs on the opposite side of the border. At the mean, a doubling of exports to a particular neighbor country reduces the gradient of light from the border by some 23 percent. This qualitative finding applies to developed and developing countries, and to rural and urban border regions. Proximity to cities on either side of the border amplifies the effects of trade. We provide evidence that local export-oriented production is a significant mechanism behind the observed effects.
    Keywords: Trade liberalization, border regions, economic geography, night lights data
    JEL: F14 F15 R11 R12
    Date: 2019–02
    URL: http://d.repec.org/n?u=RePEc:fdi:wpaper:4785&r=all
  10. By: Blumenstock, Joshua; Chi, Guanghua; Tan, Xu
    Abstract: What is the value of a social network? Prior work suggests two distinct mechanisms that have historically been difficult to differentiate: as a conduit of information, and as a source of social and economic support. We use a rich 'digital trace' dataset to link the migration decisions of millions of individuals to the topological structure of their social networks. We find that migrants systematically prefer 'interconnected' networks (where friends have common friends) to 'expansive' networks (where friends are well connected). A micro-founded model of network-based social capital helps explain this preference: migrants derive more utility from networks that are structured to facilitate social support than from networks that efficiently transmit information.
    Keywords: Big Data; Development; migration; networks; social capital; Social Networks
    JEL: D85 O12 O15 R23 Z13
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:cpr:ceprdp:13611&r=all
  11. By: Daron Acemoglu; Pascual Restrepo
    Abstract: Artificial Intelligence is set to influence every aspect of our lives, not least the way production is organized. AI, as a technology platform, can automate tasks previously performed by labor or create new tasks and activities in which humans can be productively employed. Recent technological change has been biased towards automation, with insufficient focus on creating new tasks where labor can be productively employed. The consequences of this choice have been stagnating labor demand, declining labor share in national income, rising inequality and lower productivity growth. The current tendency is to develop AI in the direction of further automation, but this might mean missing out on the promise of the "right" kind of AI with better economic and social outcomes.
    JEL: J23 J24
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:25682&r=all
  12. By: Ludovic Gouden\`ege; Andrea Molent; Antonino Zanette
    Abstract: In this paper we propose an efficient method to compute the price of American basket options, based on Machine Learning and Monte Carlo simulations. Specifically, the options we consider are written on a basket of assets, each of them following a Black-Scholes dynamics. The method we propose is a backward dynamic programming algorithm which considers a finite number of uniformly distributed exercise dates. On these dates, the value of the option is computed as the maximum between the exercise value and the continuation value, which is approximated via Gaussian Process Regression. Specifically, we consider a finite number of points, each of them representing the values reached by the underlying at a certain time. First of all, we compute the continuation value only for these points by means of Monte Carlo simulations and then we employ Gaussian Process Regression to approximate the whole continuation value function. Numerical tests show that the algorithm is fast and reliable and it can handle also American options on very large baskets of assets, overcoming the problem of the curse of dimensionality.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.11275&r=all
  13. By: YAMAMOTO Isamu; KURODA Sachiko
    Abstract: This paper examines how the introduction and utilization of new information technologies including AI, IoT and big data affect the well-being of workers including their mental health and job engagement, in addition to the types of workers and workplaces that are more influenced by the changes, based on worker panel data. First, looking at the situation of the introduction and utilization of new information technologies shows that workplaces that employ workers with more routine tasks, higher wages, longer working hours, and where there is a focus on operational efficiency, tend to introduce and utilize new information technology. Next, panel data estimation shows that well-being indices such as the mental health index and work engagement index tend to increase after the introduction and utilization of new information technology. Thus, the introduction and utilization of new technologies such as AI can be interpreted as improving well-being, including mental health and work engagement, meaning that the effect of supporting workers is greater than the negative effect caused by the extra workload or learning cost that workers must bear due to new technologies. In addition, it is shown that the impact of such new information technology on well-being is more evident for workers with clear job descriptions, high job discretion, frequent, suddenly changing work demands, and employed in workplaces that conduct work-style reform such as improving operational efficiency, reducing overtime work, promoting morning and evening non-work activities, and promoting paid holidays.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:eti:rdpsjp:19012&r=all
  14. By: Jacopo Arpetti; Antonio Iovanella
    Abstract: Increased data gathering capacity, together with the spreading of data analytics techniques, has allowed an unprecedented concentration of information related to the individuals' preferences in the hands of a few gatekeepers. In such context, the traditional economic literature has been attempting to frame all the data-driven economy features. Such features, although being able to bring about a more efficient matching of people and relevant purchase opportunities, also result into distortions and disequilibria, up to market failures. Data-economy market disequilibria can be decrypted by leveraging on some of the known network properties, thus obtaining general results suitable for building a new theoretical framework for economic phenomena. Starting from the hypothesis that a digital company can always benefit from an underlying network of consumers or items related to its market, their representation can indeed provide significant competitive advantages, also enhancing the platforms' capacity to implement discriminatory practices by means of an increased ability to estimate individuals' preferences. In the present paper, we propose a measure called Information Patrimony, considering the amount of information available within the system and we look into how platforms may exploit data stemming from connected profiles within a network, with a view to obtaining competitive advantages. Such information flow may eventually allow to envisage the emergence of a new hybrid price discrimination pattern, through which platforms may influence and steer individuals' purchase choices, as well as to apply different prices to different customers.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.11469&r=all
  15. By: Matteo Mogliani
    Abstract: We propose a new approach to mixed-frequency regressions in a high-dimensional environment that resorts to Group Lasso penalization and Bayesian techniques for estimation and inference. To improve the sparse recovery ability of the model, we also consider a Group Lasso with a spike-and-slab prior. Penalty hyper-parameters governing the model shrinkage are automatically tuned via an adaptive MCMC algorithm. Simulations show that the proposed models have good selection and forecasting performance, even when the design matrix presents high cross-correlation. When applied to U.S. GDP data, the results suggest that financial variables may have some, although limited, short-term predictive content.
    Keywords: MIDAS regressions, penalized regressions, variable selection, forecasting, Bayesian estimation.
    JEL: C11 C22 C53 E37
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:bfr:banfra:713&r=all
  16. By: Antoine Jacquier; Emma R. Malone; Mugad Oumgari
    Abstract: We introduce a stacking version of the Monte Carlo algorithm in the context of option pricing. Introduced recently for aeronautic computations, this simple technique, in the spirit of current machine learning ideas, learns control variates by approximating Monte Carlo draws with some specified function. We describe the method from first principles and suggest appropriate fits, and show its efficiency to evaluate European and Asian Call options in constant and stochastic volatility models.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.10795&r=all
  17. By: Daniel Poh; Stephen Roberts; Martin Tegn\'er
    Abstract: The non-storability of electricity makes it unique among commodity assets, and it is an important driver of its price behaviour in secondary financial markets. The instantaneous and continuous matching of power supply with demand is a key factor explaining its volatility. During periods of high demand, costlier generation capabilities are utilised since electricity cannot be stored---this has the impact of driving prices up very quickly. Furthermore, the non-storability also complicates physical hedging. Owing to this, the problem of joint price-quantity risk in electricity markets is a commonly studied theme. To this end, we investigate the use of coregionalized (or multi-task) sparse Gaussian processes (GPs) for risk management in the context of power markets. GPs provide a versatile and elegant non-parametric approach for regression and time-series modelling. However, GPs scale poorly with the amount of training data due to a cubic complexity. These considerations suggest that knowledge transfer between price and load is vital for effective hedging, and that a computationally efficient method is required. To gauge the performance of our model, we use an average-load strategy as comparator. The latter is a robust approach commonly used by industry. If the spot and load are uncorrelated and Gaussian, then hedging with the expected load will result in the minimum variance position. The main contribution of our work is twofold. Firstly, in developing a multi-task sparse GP-based approach for hedging. Secondly, in demonstrating that our model-based strategy outperforms the comparator, and can thus be employed for effective hedging in electricity markets.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.09536&r=all
  18. By: Allen, D.E.; McAleer, M.J.
    Abstract: This paper features an analysis of President Trump's two State of the Union addresses, which are analysed by means of various data mining techniques including sentiment analysis. The intention is to explore the contents and sentiments of the messages contained, the degree to which they differ, and their potential implications for the national mood and state of the economy. In order to provide a contrast and some parallel context, analyses are also undertaken of President Obama's last State of the Union address and Hitler's 1933 Berlin Proclamation. The structure of these four political addresses is remarkably similar. The three US Presidential speeches are more positive emotionally than Hitler's relatively shorter address, which is characterized by a prevalence of negative emotions. However, it should be said that the economic circumstances in contemporary America and Germany in the 1930s are vastly different
    Keywords: Text Mining, Sentiment Analysis, Word Cloud, Emotional Valence
    JEL: C19 C65 D79
    Date: 2019–03–19
    URL: http://d.repec.org/n?u=RePEc:ems:eureir:115615&r=all
  19. By: Singh, Prachi; Dey, Sagnik; Chowdhury, Sourangsu
    Abstract: This paper examines effect of outdoor air pollution on child health in India by combining satellite PM2.5 data with geo-coded Demographic and Health Survey of India(2016). Pollution levels vary due to seasonal open biomass burning events (like crop-burning and forest res) which are a common occurrence. Our identification strategy relies on spatial and temporal differences in these biomass burning events to identify the effect air pollution on child health. Our results indicate that children ex- posed to higher levels of PM2.5 during their first trimester and during the post-natal period of first three months after birth have lower Height-for-age and Weight-for-age; the effect is not limited to just rural areas, but prominent for Northern states of India which have higher incidence of such events.
    Keywords: Health Economics and Policy
    Date: 2019–02
    URL: http://d.repec.org/n?u=RePEc:ags:aare19:285036&r=all
  20. By: Jonathan I. Dingel; Antonio Miscio; Donald R. Davis
    Abstract: In developed economies, agglomeration is skill-biased: larger cities are skill-abundant and exhibit higher skilled wage premia. This paper characterizes the spatial distributions of skills in Brazil, China, and India. To facilitate comparisons with developed-economy findings, we construct metropolitan areas for each of these economies by aggregating finer geographic units on the basis of contiguous areas of light in nighttime satellite images. Our results validate this procedure. These lights-based metropolitan areas mirror commuting-based definitions in the United States and Brazil. In China and India, which lack commuting-based definitions, lights-based metropolitan populations follow a power law, while administrative units do not. Examining variation in relative quantities and prices of skill across these metropolitan areas, we conclude that agglomeration is also skill-biased in Brazil, China, and India.
    JEL: C8 O1 O18 R1
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:25678&r=all

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.