nep-big New Economics Papers
on Big Data
Issue of 2019‒05‒06
eighteen papers chosen by
Tom Coupé
University of Canterbury

  1. Machine Learning Methods Economists Should Know About By Athey, Susan; Imbens, Guido W.
  2. Illuminating Economic Growth By Yingyao Hu; Jiaxiong Yao
  3. The Wrong Kind of AI? Artificial Intelligence and the Future of Labor Demand By Acemoglu, Daron; Restrepo, Pascual
  4. Boosting the Hodrick-Prescott Filter By Peter C. B. Phillips; Zhentao Shi
  5. Curriculum Learning in Deep Neural Networks for Financial Forecasting By Allison Koenecke; Amita Gajewar
  6. Gated deep neural networks for implied volatility surfaces By Yu Zheng; Yongxin Yang; Bowei Chen
  7. Decomposition of intra-household disparity sensitive fuzzy multi-dimensional poverty index: A study of vulnerability through Machine Learning By Sen, Sugata
  8. Supervised Machine Learning for Eliciting Individual Reservation Values By John A. Clithero; Jae Joon Lee; Joshua Tasoff
  9. The digital innovation policy landscape in 2019 By Caroline Paunov; Sandra Planes-Satorra
  10. Artificial Intelligence: Socio-Political Challenges of Delegating Human Decision-Making to Machines By Braun, Robert
  11. Predicting High-Risk Opioid Prescriptions Before they are Given By Justine S. Hastings; Mark Howison; Sarah E. Inman
  12. Data Analytics in Operations Management: A Review By Velibor V. Mi\v{s}i\'c; Georgia Perakis
  13. Legal Responsibility in Investment Decisions Using Algorithms and AI By Makoto Chiba; Mikari Kashima; Kenta Sekiguchi
  14. A New Organizational Chassis for Artificial Intelligence - Exploring Organizational Readiness Factors By Pumplun, Luisa; Tauchert, Christoph; Heidt, Margareta
  15. Regulating AI: do we need new tools? By Otello Ardovino; Jacopo Arpetti; Marco Delmastro
  16. Opting out of Workers' Compensation: Non-Subscription in Texas and Its Effects By Jinks, Lu; Kniesner, Thomas J.; Leeth, John D.; Lo Sasso, Anthony T.
  17. Forecasting Realized Volatility of Russian stocks using Google Trends and Implied Volatility By Bazhenov, Timofey; Fantazzini, Dean
  18. Causally Driven Incremental Multi Touch Attribution Using a Recurrent Neural Network By Du, Ruihuan; Zhong, Yu; Nair, Harikesh S.; Cui, Bo; Shou, Ruyang

  1. By: Athey, Susan (Graduate School of Business, Stanford University, SIEPR, and NBER); Imbens, Guido W. (Graduate School of Business and Department of Economics, Stanford)
    Abstract: We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:ecl:stabus:3776&r=all
  2. By: Yingyao Hu; Jiaxiong Yao
    Abstract: This paper seeks to illuminate the uncertainty in official GDP per capita measures using auxiliary data. Using satellite-recorded nighttime lights as an additional measurement of true GDP per capita, we provide a statistical framework, in which the error in official GDP per capita may depend on the country’s statistical capacity and the relationship between nighttime lights and true GDP per capita can be nonlinear and vary with geographic location. This paper uses recently developed results for measurement error models to identify and estimate the nonlinear relationship between nighttime lights and true GDP per capita and the nonparametric distribution of errors in official GDP per capita data. We then construct more precise and robust measures of GDP per capita using nighttime lights, official national accounts data, statistical capacity, and geographic locations. We find that GDP per capita measures are less precise for middle and low income countries and nighttime lights can play a bigger role in improving such measures.
    Date: 2019–04–09
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:19/77&r=all
  3. By: Acemoglu, Daron (MIT); Restrepo, Pascual (Boston University)
    Abstract: Artificial Intelligence is set to influence every aspect of our lives, not least the way production is organized. AI, as a technology platform, can automate tasks previously performed by labor or create new tasks and activities in which humans can be productively employed. Recent technological change has been biased towards automation, with insufficient focus on creating new tasks where labor can be productively employed. The consequences of this choice have been stagnating labor demand, declining labor share in national income, rising inequality and lower productivity growth. The current tendency is to develop AI in the direction of further automation, but this might mean missing out on the promise of the "right" kind of AI with better economic and social outcomes.
    Keywords: automation, artificial intelligence, jobs, inequality, innovation, labor demand, productivity, tasks, technology, wages
    JEL: J23 J24
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12292&r=all
  4. By: Peter C. B. Phillips; Zhentao Shi
    Abstract: The Hodrick-Prescott (HP) filter is one of the most widely used econometric methods in applied macroeconomic research. The technique is nonparametric and seeks to decompose a time series into a trend and a cyclical component unaided by economic theory or prior trend specification. Like all nonparametric methods, the HP filter depends critically on a tuning parameter that controls the degree of smoothing. Yet in contrast to modern nonparametric methods and applied work with these procedures, empirical practice with the HP filter almost universally relies on standard settings for the tuning parameter that have been suggested largely by experimentation with macroeconomic data and heuristic reasoning about the form of economic cycles and trends. As recent research has shown, standard settings may not be adequate in removing trends, particularly stochastic trends, in economic data. This paper proposes an easy-to-implement practical procedure of iterating the HP smoother that is intended to make the filter a smarter smoothing device for trend estimation and trend elimination. We call this iterated HP technique the boosted HP filter in view of its connection to L2-boosting in machine learning. The paper develops limit theory to show that the boosted HP filter asymptotically recovers trend mechanisms that involve unit root processes, deterministic polynomial drifts, and polynomial drifts with structural breaks -- the most common trends that appear in macroeconomic data and current modeling methodology. A stopping criterion is used to automate the iterative HP algorithm, making it a data-determined method that is ready for modern data-rich environments in economic research. The methodology is illustrated using three real data examples that highlight the differences between simple HP filtering, the data-determined boosted filter, and an alternative autoregressive approach.
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1905.00175&r=all
  5. By: Allison Koenecke; Amita Gajewar
    Abstract: For any financial organization, computing accurate quarterly forecasts for various products is one of the most critical operations. As the granularity at which forecasts are needed increases, traditional statistical time series models may not scale well. We apply deep neural networks in the forecasting domain by experimenting with techniques from Natural Language Processing (Encoder-Decoder LSTMs) and Computer Vision (Dilated CNNs), as well as incorporating transfer learning. A novel contribution of this paper is the application of curriculum learning to neural network models built for time series forecasting. We illustrate the performance of our models using Microsoft's revenue data corresponding to Enterprise, and Small, Medium & Corporate products, spanning approximately 60 regions across the globe for 8 different business segments, and totaling in the order of tens of billions of USD. We compare our models' performance to the ensemble model of traditional statistics and machine learning techniques currently used by Microsoft Finance. With this in-production model as a baseline, our experiments yield an approximately 30% improvement in overall accuracy on test data. We find that our curriculum learning LSTM-based model performs best, showing that it is reasonable to implement our proposed methods without overfitting on medium-sized data.
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1904.12887&r=all
  6. By: Yu Zheng; Yongxin Yang; Bowei Chen
    Abstract: In this paper, we propose a gated deep neural network model to predict implied volatility surfaces. Conventional financial conditions and empirical evidence related to the implied volatility are incorporated into the neural network architecture design and calibration including no static arbitrage, boundaries, asymptotic slope and volatility smile. They are also satisfied empirically by the option data on the S&P 500 over a ten years period. Our proposed model outperforms the widely used surface stochastic volatility inspired model on the mean average percentage error in both in-sample and out-of-sample datasets. The research of this study has a fundamental methodological contribution to the emerging trend of applying the state-of-the-art information technology into business studies as our model provides a framework of integrating data-driven machine learning algorithms with financial theories and this framework can be easily extended and applied to solve other problems in finance or other business fields.
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1904.12834&r=all
  7. By: Sen, Sugata
    Abstract: The traditional multi-dimensional measures have failed to properly project the vulnerability of human-beings towards poverty. Some of the reasons behind this inability may be the failure of the existing measures to recognise the graduality inside the concept of poverty and the disparities within the household in wealth distribution. So this work wants to develop a measure to estimate the vulnerability of households in becoming poor in a multidimensional perspective through incorporating the intra-household disparities and graduality within the causal factors. Dimensional decomposition of the developed vulnerability measure is also under the purview of this work. To estimate the vulnerability and dimensional influences with the help of artificial intelligence an integrated mathematical framework is developed.
    Keywords: Poverty, Vulnerability, Fuzzy logic, Intra-household disparity, Shapley Value Decomposition, Machine Learning, LIME
    JEL: C63 I32
    Date: 2019–04–28
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:93550&r=all
  8. By: John A. Clithero; Jae Joon Lee; Joshua Tasoff
    Abstract: Direct elicitation, guided by theory, is the standard method for eliciting individual-level latent variables. We present an alternative approach, supervised machine learning (SML), and apply it to measuring individual valuations for goods. We find that the approach is superior for predicting out-of-sample individual purchases relative to a canonical direct-elicitation approach, the Becker-DeGroot-Marschak (BDM) method. The BDM is imprecise and systematically biased by understating valuations. We characterize the performance of SML using a variety of estimation methods and data. The simulation results suggest that prices set by SML would increase revenue by 22% over the BDM, using the same data.
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1904.13329&r=all
  9. By: Caroline Paunov; Sandra Planes-Satorra
    Abstract: How are OECD countries supporting digital innovation and ensuring that benefits spread across the economy? This paper explores the current landscape of strategies and initiatives implemented in OECD countries to support innovation in the digital age. It identifies common trends and differences in national digital, smart industry and artificial intelligence (AI) strategies. The paper also discusses policy instruments used across OECD to support digital innovation targeting four objectives: First, policies aimed at enhancing digital technology adoption and diffusion, including demonstration facilities for SMEs. Second, initiatives that promote collaborative innovation, including via the creation of digital innovation clusters and knowledge intermediaries. Third, support for research and innovation in key digital technologies, particularly AI (e.g. by establishing testbeds and regulatory sandboxes). Fourth, policies to encourage digital entrepreneurship (e.g. through early-stage business acceleration support).
    Keywords: digital innovation, digital technologies and artificial intelligence (AI), innovation and research policy, innovation strategies
    JEL: O30 O31 O33 O38 O25 I28
    Date: 2019–05–06
    URL: http://d.repec.org/n?u=RePEc:oec:stiaac:71-en&r=all
  10. By: Braun, Robert (Institute for Advanced Studies, Vienna, Techno Science and Societal Transformation Research Group)
    Abstract: Artificial intelligence is at the heart of current debates related to ethical, social and political issues of technological innovation. This briefing refocuses attention from the techno-ethical challenges of AI to artificial decision-making (ADM) and the questions related to delegating human decisions to ADM. It is argued that (a) from a socio-ethical point of view the delegation is more relevant than the actual ethical problems of AI systems; (b) instead of traditional responsible AI approaches focusing on accountability, responsibility and transparency (ART) we should direct our attention to trustworthiness in the delegation process; and (c) trustworthiness as a socio-communicational challenge leads to questions that may be guided by a responsible research and innovation framework of anticipation, reflexivity, inclusion, and responsiveness. This may lead to different questions policymakers and other interested publics may ask as well as novel approaches, including regulatory sandboxes and other measures to foster a more inclusive, open and democratic culture of human-ADM relations.
    Keywords: AI, arithmetic decision-making, delegation, Arendt, RRI
    JEL: M14 M31 O31 O32 O33
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:ihs:ihswps:6&r=all
  11. By: Justine S. Hastings; Mark Howison; Sarah E. Inman
    Abstract: Misuse of prescription opioids is a leading cause of premature death in the United States. We use new state government administrative data and machine learning methods to examine whether the risk of future opioid dependence, abuse, or poisoning can be predicted in advance of an initial opioid prescription. Our models accurately predict these outcomes and identify particular prior non-opioid prescriptions, medical history, incarceration, and demographics as strong predictors. Using our model estimates, we simulate a hypothetical policy which restricts new opioid prescriptions to only those with low predicted risk. The policy’s potential benefits likely outweigh costs across demographic subgroups, even for lenient definitions of “high risk.” Our findings suggest new avenues for prevention using state administrative data, which could aid providers in making better, data-informed decisions when weighing the medical benefits of opioid therapy against the risks.
    JEL: D61 I1 I12 I18 Z18
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:25791&r=all
  12. By: Velibor V. Mi\v{s}i\'c; Georgia Perakis
    Abstract: Research in operations management has traditionally focused on models for understanding, mostly at a strategic level, how firms should operate. Spurred by the growing availability of data and recent advances in machine learning and optimization methodologies, there has been an increasing application of data analytics to problems in operations management. In this paper, we review recent applications of data analytics to operations management, in three major areas -- supply chain management, revenue management and healthcare operations -- and highlight some exciting directions for the future.
    Date: 2019–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1905.00556&r=all
  13. By: Makoto Chiba (Bank of Japan); Mikari Kashima (Bank of Japan); Kenta Sekiguchi (Bank of Japan)
    Abstract: This article provides an overview of the report released by a study group on legal issues regarding financial investments using algorithms/artificial intelligence (AI). The report focuses on legal issues regarding the automated or black-boxed financial investment decisions by using algorithms/AI. Specifically, the report discusses the points for consideration in applying laws regarding (1) regulations and civil liability issues surrounding business operators for investment management or investment advisory activities, and (2) regulations on market misconduct. The report shows that the application of some existing laws requires the presence of a certain mental state (such as purpose and intent), which is unlikely to be given in the case of investment decisions using algorithms/AI. To deal with this problem, the report considers the necessity of introducing new legislation.
    Keywords: algorithm; artificial intelligence; AI; investment decision; duty to explain; duty of due care of a prudent manager; market manipulation; insider trading
    JEL: K22
    Date: 2019–04–26
    URL: http://d.repec.org/n?u=RePEc:boj:bojlab:lab19e01&r=all
  14. By: Pumplun, Luisa; Tauchert, Christoph; Heidt, Margareta
    Date: 2019–06–08
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:112582&r=all
  15. By: Otello Ardovino; Jacopo Arpetti; Marco Delmastro
    Abstract: The Artificial Intelligence paradigm (hereinafter referred to as "AI") builds on the analysis of data able, among other things, to snap pictures of the individuals' behaviors and preferences. Such data represent the most valuable currency in the digital ecosystem, where their value derives from their being a fundamental asset in order to train machines with a view to developing AI applications. In this environment, online providers attract users by offering them services for free and getting in exchange data generated right through the usage of such services. This swap, characterized by an implicit nature, constitutes the focus of the present paper, in the light of the disequilibria, as well as market failures, that it may bring about. We use mobile apps and the related permission system as an ideal environment to explore, via econometric tools, those issues. The results, stemming from a dataset of over one million observations, show that both buyers and sellers are aware that access to digital services implicitly implies an exchange of data, although this does not have a considerable impact neither on the level of downloads (demand), nor on the level of the prices (supply). In other words, the implicit nature of this exchange does not allow market indicators to work efficiently. We conclude that current policies (e.g. transparency rules) may be inherently biased and we put forward suggestions for a new approach.
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1904.12134&r=all
  16. By: Jinks, Lu (University of Illinois at Chicago); Kniesner, Thomas J. (Claremont Graduate University); Leeth, John D. (Bentley University); Lo Sasso, Anthony T. (University of Illinois at Chicago)
    Abstract: Texas is the only state that does not mandate that employers carry workers' compensation insurance (WC) coverage. We employ a quasi-experimental design paired with a novel machine learning approach to examine the effects of switching from traditional workers' compensation to a so-called non-subscription program in Texas. Specifically, we compare before and after effects of switching to non-subscription for employees in Texas to contemporaneously measured before and after differences for non-Texas-based employees. Importantly, we study large self-insured companies operating the same business in multiple states in the US; hence the non-Texas operations represent the control sites for the Texas treatment sites. The resulting difference-in-differences estimation technique allows us to control for any companywide factors that might be confounded with switching to non-subscription. Our empirical approach also controls for injury characteristics, employment characteristics, industry, and individual characteristics such as gender, age, number of dependents, and marital status. Outcomes include number of claims reported, medical expenditures, indemnity payments, time to return to work, likelihood of having permanent disability, likelihood of claim denial, and likelihood of litigation. The data include 25 switcher companies between the years 2004 and 2016, yielding 846,376 injury incidents. Regression findings suggest that indemnity, medical payments, and work-loss fall substantially. Claim denials increase and litigation falls.
    Keywords: workers' compensation insurance, non-subscription, difference-in-differences, triple differences, machine learning, PDS-LASSO
    JEL: C54 I13 J32 J38
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12290&r=all
  17. By: Bazhenov, Timofey; Fantazzini, Dean
    Abstract: This work proposes to forecast the Realized Volatility (RV) and the Value-at-Risk (VaR) of the most liquid Russian stocks using GARCH, ARFIMA and HAR models, including both the implied volatility computed from options prices and Google Trends data. The in-sample analysis showed that only the implied volatility had a significant effect on the realized volatility across most stocks and estimated models, whereas Google Trends did not have any significant effect. The out-of-sample analysis highlighted that models including the implied volatility improved their forecasting performances, whereas models including internet search activity worsened their performances in several cases. Moreover, simple HAR and ARFIMA models without additional regressors often reported the best forecasts for the daily realized volatility and for the daily Value-at-Risk at the 1% probability level, thus showing that efficiency gains more than compensate any possible model misspecifications and parameters biases. Our empirical evidence shows that, in the case of Russian stocks, Google Trends does not capture any additional information already included in the implied volatility.
    Keywords: Forecasting; Realized Volatility; Value-at-Risk; Implied Volatility; Google Trends; GARCH; ARFIMA; HAR;
    JEL: C22 C51 C53 G17 G32
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:93544&r=all
  18. By: Du, Ruihuan (?); Zhong, Yu (?); Nair, Harikesh S. (Stanford University Graduate School of Business); Cui, Bo (?); Shou, Ruyang (?)
    Abstract: This paper describes a practical system for Multi Touch Attribution (MTA) for use by a publisher of digital ads. We developed this system for JD.com, an eCommerce company, which is also a publisher of digital ads in China. The approach has two steps. The first step (“response modeling†) fits a user-level model for purchase of a product as a function of the user’s exposure to ads. The second (“credit allocation†) uses the fitted model to allocate the incremental part of the observed purchase due to advertising, to the ads the user is exposed to over the previous T days. To implement step one, we train a Recurrent Neural Network (RNN) on user-level conversion and exposure data. The RNN has the advantage of flexibly handling the sequential dependence in the data in a semi-parametric way. The specific RNN formulation we implement captures the impact of advertising intensity, timing, competition, and user-heterogeneity, which are known to be relevant to ad-response. To implement step two, we compute Shapley Values, which have the advantage of having axiomatic foundations and satisfying fairness considerations. The specific formulation of the Shapley Value we implement respects incrementality by allocating the overall incremental improvement in conversion to the exposed ads, while handling the sequence-dependence of exposures on the observed outcomes. The system is under production at JD.com, and scales to handle the high dimensionality of the problem on the platform (attribution of the orders of about 300M users, for roughly 160K brands, across 200+ ad-types, served about 80B ad-impressions over a typical 15-day period).
    Date: 2019–01
    URL: http://d.repec.org/n?u=RePEc:ecl:stabus:3761&r=all

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.