nep-big New Economics Papers
on Big Data
Issue of 2018‒12‒24
fifteen papers chosen by
Tom Coupé
University of Canterbury

  1. Classifying Firms with Text Mining By Giacomo Caterini
  2. Forecasting Tourist Arrivals with Google Trends and Mixed Frequency Data By Havranek, Tomas; Zeynalov, Ayaz
  3. Understanding AI Driven Innovation by Linked Database of Scientific Articles and Patents By MOTOHASHI Kazuyuki
  4. The U.S. Syndicated Loan Market : Matching Data By Gregory J. Cohen; Melanie Friedrichs; Kamran Gupta; William Hayes; Seung Jung Lee; W. Blake Marsh; Nathan Mislang; Maya Shaton; Martin Sicilian
  5. The U.S. Syndicated Loan Market: Matching Data By Cohen, Gregory J.; Friedrichs, Melanie; Gupta, Kamran; Hayes, William; Lee, Seung Jung; Marsh, W. Blake; Mislang, Nathan; Shaton, Maya; Sicilian, Martin
  6. Deep neural networks algorithms for stochastic control problems on finite horizon, Part 2: numerical applications By Achref Bachouch; C\^ome Hur\'e; Nicolas Langren\'e; Huyen Pham
  7. The Dominium Mundi Game and the Case for Artificial Intelligence in Economics and the Law By Rodríguez Arosemena, Nicolás
  8. Calibrating rough volatility models: a convolutional neural network approach By Henry Stone
  9. How 'Big Data' affects competition law analysis in Online Platforms and Agriculture: does one size fit all? By Atik, Can
  10. Big Data, Computational Science, Economics, Finance, Marketing, Management, and Psychology: Connections By Chang, C-L.; McAleer, M.J.; Wong, W.-K.
  11. Bayesian Forecasting of Electoral Outcomes with new Parties' Competition By José García-Montalvo; Omiros Papaspiliopoulos; Timothée Stumpf-Fétizon
  12. Size matters: Estimation sample length and electricity price forecasting accuracy By Carlo Fezzi; Luca Mosetti
  13. Bayesian forecasting of electoral outcomes with new parties' competition By José Garcia Montalvo; Omiros Papaspiliopoulos; Timothée Stumpf-Fétizon
  14. The Development of Digital Technology for IT, IoT, Big Data, and AI in Japan's Fourth Industrial Revolution By KIMOTO Hiroshi; SAWATANI Yuriko; SAITO Naho; IWAMOTO Koichi; TANOUE Yuta; INOUE Yusuka
  15. Efficient Counterfactual Learning from Bandit Feedback By Yusuke Narita; Shota Yasui; Kohei Yata

  1. By: Giacomo Caterini
    Abstract: Statistics on the births, deaths and survival rates of firms are crucial pieces of information, as they enter as an input in the computation of GDP, the identification of each sector’s contribution to the economy, and the assessment of gross job creation and destruction rates. Official statistics on firm demography are made available only several months after data collection and storage, however. Furthermore, unprocessed and untimely administrative data can lead to a misrepresentation of the life-cycle stage of a firm. In this paper we implement an automated version of Eurostat’s algorithm aimed at distinguishing true startup endeavors from the resurrection of pre-existing but apparently defunct firms. The potential gains from combining machine learning, natural language processing and econometric tools for pre- processing and analyzing granular data are exposed, and a machine learning method predicting reactivations of deceptively dead firms is proposed.
    Keywords: Business Demography; Classification; Text Mining
    JEL: C01 C52 C53 C80 G33 L11 L25 L26 M13 R11
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:trn:utwprg:2018/09&r=big
  2. By: Havranek, Tomas; Zeynalov, Ayaz
    Abstract: In this paper, we examine the usefulness of Google Trends data in predicting monthly tourist arrivals and overnight stays in Prague during the period between January 2010 and December 2016. We offer two contributions. First, we analyze whether Google Trends provides significant forecasting improvements over models without search data. Second, we assess whether a high-frequency variable (weekly Google Trends) is more useful for accurate forecasting than a low-frequency variable (monthly tourist arrivals) using Mixed-data sampling (MIDAS). Our results stress the potential of Google Trends to offer more accurate prediction in the context of tourism: we find that Google Trends information, both two months and one week ahead of arrivals, is useful for predicting the actual number of tourist arrivals. The MIDAS forecasting model that employs weekly Google Trends data outperforms models using monthly Google Trends data and models without Google Trends data.
    Keywords: Google trends,mixed-frequency data,forecasting,tourism
    JEL: C53 L83
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:zbw:esprep:187420&r=big
  3. By: MOTOHASHI Kazuyuki
    Abstract: The linked dataset of AI research articles and patents reveals that substantial contributions by the public sector are found in AI development. In addition, the role of researchers who are involved both in publication and patent activities, particularly in the private sector, increased over time. That is, open science that is publicly available in research articles, and propriety technology that is protected by patents, are intertwined in AI development. In addition, the impact of AI, combined with big data and IoT, which are defined as "new" IT innovations, is discussed by comparing it with traditional IT, which only consists of the technological progress of computer hardware and software developments. Both new and traditional IT can be understood by using the framework of GPT (general purpose technology), while the organization of new IT innovation can be characterized as an emergence of multiple ecosystems, instead of being organized in the pattern of platform leadership, found in traditional IT.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:eti:polidp:18017&r=big
  4. By: Gregory J. Cohen; Melanie Friedrichs; Kamran Gupta; William Hayes; Seung Jung Lee; W. Blake Marsh; Nathan Mislang; Maya Shaton; Martin Sicilian
    Abstract: We introduce a new software package for determining linkages between datasets without common identifiers. We apply these methods to three datasets commonly used in academic research on syndicated lending: Refinitiv LPC DealScan, the Shared National Credit Database, and S&P Global Market Intelligence Compustat. We benchmark the results of our match using results from the literature and previously matched files that are publicly available. We find that the company level matching is enhanced by careful cleaning of the data and considering hierarchical relationships. For loan level matching, a tailored approach based on a good understanding of the data can be better in certain dimensions than a more pure machine learning approach. The R package for the company level match can be found on Github.
    Keywords: Bank credit ; Company level matching ; Loan level matching ; Probabilistic matching ; Syndicated loans
    JEL: C88 G21 E44
    Date: 2018–12–07
    URL: http://d.repec.org/n?u=RePEc:fip:fedgfe:2018-85&r=big
  5. By: Cohen, Gregory J.; Friedrichs, Melanie; Gupta, Kamran (Federal Reserve Bank of Kansas City); Hayes, William; Lee, Seung Jung; Marsh, W. Blake (Federal Reserve Bank of Kansas City); Mislang, Nathan; Shaton, Maya; Sicilian, Martin
    Abstract: We introduce a new software package for determining linkages between datasets without common identifiers. We apply these methods to three datasets commonly used in academic research on syndicated lending: Refinitiv LPC DealScan, the Shared National Credit Database, and S&P Global Market Intelligence Compustat. We benchmark the results of our match using results from the literature and previously matched files that are publicly available. We find that the company level matching is enhanced by careful cleaning of the data and considering hierarchical relationships. For loan level matching, a tailored approach based on a good understanding of the data can be better in certain dimensions than a more pure machine learning approach. The R package for the company level match can be found on Github at https://github.com/seunglee98/fedmatch.
    Keywords: Bank Credit; Syndicated Loans; Probabilistic Matching; Company Level Matching; Loan Level Matching
    JEL: C88 E44 G21
    Date: 2018–12–03
    URL: http://d.repec.org/n?u=RePEc:fip:fedkrw:rwp18-09&r=big
  6. By: Achref Bachouch (UiO); C\^ome Hur\'e (LPSM UMR 8001, UPD7); Nicolas Langren\'e (CSIRO); Huyen Pham (LPSM UMR 8001, UPD7)
    Abstract: This paper presents several numerical applications of deep learning-based algorithms that have been analyzed in [11]. Numerical and comparative tests using TensorFlow illustrate the performance of our different algorithms, namely control learning by performance iteration (algorithms NNcontPI and ClassifPI), control learning by hybrid iteration (algorithms Hybrid-Now and Hybrid-LaterQ), on the 100-dimensional nonlinear PDEs examples from [6] and on quadratic Backward Stochastic Differential equations as in [5]. We also provide numerical results for an option hedging problem in finance, and energy storage problems arising in the valuation of gas storage and in microgrid management.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1812.05916&r=big
  7. By: Rodríguez Arosemena, Nicolás
    Abstract: This paper presents two conjectures that are the product of the reconciliation between modern economics and the long-standing jurisprudential tradition originated in Ancient Rome, whose influence is still pervasive in most of the world's legal systems. We show how these conjectures together with the theory that supports them can provide us with a powerful normative mean to solve the world's most challenging problems such as financial crises, poverty, wars, man-made environmental catastrophes and preventable deaths. The core of our theoretical framework is represented by a class of imperfect information game built completely on primitives (self-interest, human fallibility and human sociability) that we have called the Dominium Mundi Game (DMG) for reasons that will become obvious. Given the intrinsic difficulties that arise in solving this type of models, we advocate for the use of artificial intelligence as a potentially feasible method to determine the implications of the definitions and assumptions derived from the DMG's framework.
    Keywords: Game Theory; Artificial Intelligence; Dynamic Programming Squared; Imperfect Information Games; Law and Economics
    JEL: C7 C73 D6 K0
    Date: 2018–12–15
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:90560&r=big
  8. By: Henry Stone
    Abstract: In this paper we use convolutional neural networks to find the H\"older exponent of simulated sample paths of the rBergomi model, a recently proposed stock price model used in mathematical finance. We contextualise this as a calibration problem, thereby providing a very practical and useful application.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1812.05315&r=big
  9. By: Atik, Can
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:zbw:itse18:184928&r=big
  10. By: Chang, C-L.; McAleer, M.J.; Wong, W.-K.
    Abstract: The paper provides a review of the literature that connects Big Data, Computational Science, Economics, Finance, Marketing, Management, and Psychology, and discusses some research that is related to the seven disciplines. Academics could develop theoretical models and subsequent econometric and statistical models to estimate the parameters in the associated models, as well as conduct simulation to examine whether the estimators in their theories on estimation and hypothesis testing have good size and high power. Thereafter, academics and practitioners could apply theory to analyse some interesting issues in the seven disciplines and cognate areas.
    Keywords: Big Data, Computational science, Economics, Finance, Management, Theoretical models, Econometric and statistical models, Applications.
    Date: 2018–01–01
    URL: http://d.repec.org/n?u=RePEc:ems:eureir:112499&r=big
  11. By: José García-Montalvo; Omiros Papaspiliopoulos; Timothée Stumpf-Fétizon
    Abstract: We propose a new methodology for predicting electoral results that combines a fundamental model and national polls within an evidence synthesis framework. Although novel, the methodology builds upon basic statistical structures, largely modern analysis of variance type models, and it is carried out in open-source software. The methodology is largely motivated by the specific challenges of forecasting elections with the participation of new political parties, which is becoming increasingly common in the post-2008 European panorama. Our methodology is also particularly useful for the allocation of parliamentary seats, since the vast majority of available opinion polls predict at the national level whereas seats are allocated at local level. We illustrate the advantages of our approach relative to recent competing approaches using the 2015 Spanish Congressional Election. In general, the predictions of our model outperform the alternative specifications, including hybrid models that combine fundamental and polls' models. Our forecasts are, in relative terms, particularly accurate to predict the seats obtained by each political party.
    Keywords: multilevel model, Bayesian machine learning, inverse regression, evidence synthesis, elections
    JEL: C11 C53 C63 D72
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:bge:wpaper:1065&r=big
  12. By: Carlo Fezzi; Luca Mosetti
    Abstract: Electricity price forecasting models are typically estimated via rolling windows, i.e. by using only the most recent observations. Nonetheless, the current literature does not provide much guidance on how to select the size of such windows. This paper shows that determining the appropriate window prior to estimation dramatically improves forecasting performances. In addition, it proposes a simple two-step approach to choose the best performing models and window sizes. The value of this methodology is illustrated by analyzing hourly datasets from two large power markets with a selection of ten different forecasting models. Incidentally, our empirical application reveals that simple models, such as the linear regression, can perform surprisingly well if estimated on extremely short samples.
    Keywords: electricity price forecasting, day-ahead market, parameter instability, bandwidth selection, artificial neural networks
    JEL: C22 C45 C51 C53 Q47
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:trn:utwprg:2018/10&r=big
  13. By: José Garcia Montalvo; Omiros Papaspiliopoulos; Timothée Stumpf-Fétizon
    Abstract: We propose a new methodology for predicting electoral results that com- bines a fundamental model and national polls within an evidence synthesis framework. Although novel, the methodology builds upon basic statistical structures, largely modern analysis of variance type models, and it is car- ried out in open-source software. The methodology is largely motivated by the speci c challenges of forecasting elections with the participation of new political parties, which is becoming increasingly common in the post-2008 European panorama. Our methodology is also particularly useful for the al- location of parliamentary seats, since the vast majority of available opinion polls predict at the national level whereas seats are allocated at local level. We illustrate the advantages of our approach relative to recent competing approaches using the 2015 Spanish Congressional Election. In general the predictions of our model outperform the alternative speci cations, including hybrid models that combine fundamental and polls' models. Our forecasts are, in relative terms, particularly accurate to predict the seats obtained by each political party.
    Keywords: Multilevel models, Bayesian machine learning, inverse regression, evidence synthesis, elections
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:upf:upfgen:1624&r=big
  14. By: KIMOTO Hiroshi; SAWATANI Yuriko; SAITO Naho; IWAMOTO Koichi; TANOUE Yuta; INOUE Yusuka
    Abstract: This paper describes the first scientific and basic survey data which can explain the present situation of the development of digital technology for information technology (IT), Internet of Things (IoT), big data, and artificial intelligence (AI) in Japan's fourth industrial revolution. And this paper describes the result of the analysis of the survey study. Big companies are trying to develop new business using digital technology but SME's is still fall behind. The difference of each industrial sector is more clear. If this survey study shall be done every two years, the dynamic survey will be possible, which is the first purpose.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:eti:rpdpjp:18019&r=big
  15. By: Yusuke Narita (Cowles Foundation, Yale University); Shota Yasui (CyberAgent Inc.); Kohei Yata (Yale University)
    Abstract: What is the most statistically e?icient way to do o?-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider o?line estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical con?dence compared to a state-of-theart benchmark.
    Keywords: Machine Learning, Artificial Intelligence, Bandit Algorithm, Counterfactual Prediction, Propensity Score, Semiparametric Efficiency Bound, Advertisement Design
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:cwl:cwldpp:2155&r=big

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.