nep-big New Economics Papers
on Big Data
Issue of 2019‒04‒22
fourteen papers chosen by
Tom Coupé
University of Canterbury

  1. Collaboration and Delegation Between Humans and AI: An Experimental Investigation of the Future of Work By Fügener, A.; Grahl, J.; Gupta, A.; Ketter, W.
  2. When are Google data useful to nowcast GDP? An approach via pre-selection and shrinkage By Laurent Ferrara; Anna Simoni
  3. Can Google Search Data Help Predict Macroeconomic Series? By Robin Niesert; Jochem Oorschot; Chris Veldhuisen; Kester Brons; Rutger-Jan Lange
  4. Human Rights, Artificial Intelligence and Heideggerian Technoskepticism: The Long (Worrisome?) View By Risse, Mathias
  5. Stock Forecasting using M-Band Wavelet-Based SVR and RNN-LSTMs Models By Hieu Quang Nguyen; Abdul Hasib Rahimyar; Xiaodi Wang
  6. How Effective Was the UK Carbon Tax? — A Machine Learning Approach to Policy Evaluation By Jan Abrell; Mirjam Kosch; Sebastian Rausch
  7. Universal features of price formation in financial markets: perspectives from Deep Learning By Justin Sirignano; Rama Cont
  8. New Frontiers of Chinese Defense Innovation: Artificial Intelligence and Quantum Technologies By Kania, Elsa B.
  9. Tax Morale and Perceived Intergenerational Mobility: a Machine Learning Predictive Approach By Caferra, Rocco; Morone, Andrea
  10. Automation and Irish Towns: Who's Most at Risk? By Crowley, Frank; Doran, Justin
  11. Beyond quantified ignorance: Rebuilding rationality without the bias bias By Brighton, Henry
  12. EU Merger Policy Predictability Using Random Forests By Pauline Affeldt
  13. The University of California ClioMetric History Project and Formatted Optical Character Recognition By Bleemer, Zachary
  14. Deep-learning based numerical BSDE method for barrier options By Bing Yu; Xiaojing Xing; Agus Sudjianto

  1. By: Fügener, A.; Grahl, J.; Gupta, A.; Ketter, W.
    Abstract: A defining question of our age is how AI will influence the workplace of the future and, thereby, the human condition. The dominant perspective is that the competition between AI and humans will be won by either humans or machines. We argue that the future workplace may not belong exclusively to humans or machines. Instead, it is better to use AI together with humans by combining their unique characteristics and abilities. In three experimental studies, we let humans and a state of the art AI classify images alone and together. As expected, the AI outperforms humans. Humans could improve by delegating to the AI, but this combined effort still does not outperform AI itself. The most effective scenario was inversion, where the AI delegated to a human when it was uncertain. Humans could in theory outperform all other configurations if they delegated effectively to the AI, but they did not. Human delegation suffered from wrong self-assessment and lack of strategy. We show that humans are even bad at delegating if they put effort in delegating well; the reason being that despite their best intentions, their perception of task difficulty is often not aligned with the real task difficulty if the image is hard. Humans did not know what they did not know. Because of this, they do not delegate the right images to the AI. This result is novel and important for human-AI collaboration at the workplace. We believe it has broad implications for the future of work, the design of decision support systems, and management education in the age of AI.
    Keywords: Future of Work, Artificial Intelligence, Augmented Decision Environment, Deep Learning, Human-AI Collaboration, Machine Learning, Intelligent Software Agents
    Date: 2019–04–08
  2. By: Laurent Ferrara; Anna Simoni
    Abstract: Nowcasting GDP growth is extremely useful for policy-makers to assess macroeconomic conditions in real-time. In this paper, we aim at nowcasting euro area GDP with a large database of Google search data. Our objective is to check whether this specific type of information can be useful to increase GDP nowcasting accuracy, and when, once we control for official variables. In this respect, we estimate shrunk bridge regressions that integrate Google data optimally screened through a targeting method, and we empirically show that this approach provides some gain in pseudo-real-time nowcasting of euro area GDP quarterly growth. Especially, we get that Google data bring useful information for GDP nowcasting for the four first weeks of the quarter when macroeconomic information is lacking. However, as soon as official data become available, their relative nowcasting power vanishes. In addition, a true real-time analysis confirms that Google data constitute a reliable alternative when official data are lacking.
    Keywords: Nowcasting, Big data, Sure Independence Screening, Ridge Regularization.
    JEL: C53 E37
    Date: 2019
  3. By: Robin Niesert; Jochem Oorschot (Econometric Institute, Erasmus University); Chris Veldhuisen; Kester Brons; Rutger-Jan Lange (Econometric Institute, Erasmus University)
    Abstract: We use Google search data with the aim of predicting unemployment, CPI and consumer confidence for the US, UK, Canada, Germany and Japan. Google search queries have previously proven valuable in predicting macroeconomic variables in an in-sample context. To our knowledge, the more challenging question of whether such data have out-of-sample predictive value has not yet been satisfactorily answered. We focus on out-of-sample nowcasting, and extend the Bayesian Structural Time Series model using the Hamiltonian sampler for variable selection. We find that the search data retain their value in an out- of-sample predictive context for unemployment, but not for CPI and consumer confidence. It may be that online search behaviour is a relatively reliable gauge of an individual's personal situation (employment status), but less reliable when it comes to variables that are unknown to the individual (CPI) or too general to be linked to specific search terms (consumer confidence).
    Keywords: Bayesian methods, forecasting practice, Kalman filter, macroeconomic forecasting, state space models, nowcasting, spike-and-slab, Hamiltonian sampler
    JEL: C11 C53
  4. By: Risse, Mathias (Harvard Kennedy School)
    Abstract: My concern is with the impact of Artificial Intelligence on human rights. I first identify two presumptions about ethics-and-AI we should make only with appropriate qualifications. These presumptions are that (a) for the time being investigating the impact of AI, especially in the human-rights domain, is a matter of investigating impact of certain tools, and that (b) the crucial danger is that some such tools--the artificially intelligent ones--might eventually become like their creators and conceivably turn against them. We turn to Heidegger's influential philosophy of technology to argue these presumptions require qualifications of a sort that should inform our discussion of AI. Next I argue that one major challenge is how human rights will prevail in an era that quite possibly is shaped by an enormous increase in economic inequality. Currently the human-rights movement is rather unprepared to deal with the resulting challenges. What is needed is greater focus on social justice/distributive justice, both domestically and globally, to make sure societies do not fall apart. I also argue that, in the long run, we must be prepared to deal with more types of moral status than we currently do and that quite plausibly some machines will have some type of moral status, which may or may not fall short of the moral status of human beings (a point also emerging from the Heidegger discussion). Machines may have to be integrated into human social and political lives.
    Date: 2019–02
  5. By: Hieu Quang Nguyen; Abdul Hasib Rahimyar; Xiaodi Wang
    Abstract: The task of predicting future stock values has always been one that is heavily desired albeit very difficult. This difficulty arises from stocks with non-stationary behavior, and without any explicit form. Hence, predictions are best made through analysis of financial stock data. To handle big data sets, current convention involves the use of the Moving Average. However, by utilizing the Wavelet Transform in place of the Moving Average to denoise stock signals, financial data can be smoothened and more accurately broken down. This newly transformed, denoised, and more stable stock data can be followed up by non-parametric statistical methods, such as Support Vector Regression (SVR) and Recurrent Neural Network (RNN) based Long Short-Term Memory (LSTM) networks to predict future stock prices. Through the implementation of these methods, one is left with a more accurate stock forecast, and in turn, increased profits.
    Date: 2019–04
  6. By: Jan Abrell (ZHAW Winterthur and ETH Zurich, Switzerland); Mirjam Kosch (ZHAW Winterthur and ETH Zurich, Switzerland); Sebastian Rausch (ETH Zurich, Switzerland)
    Abstract: Carbon taxes are commonly seen as a rational policy response to climate change, but little is known about their performance from an ex-post perspective. This paper analyzes the emissions and cost impacts of the UK CPS, a carbon tax levied on all fossil-fired power plants. To overcome the problem of a missing control group, we propose a novel approach for policy evaluation which leverages economic theory and machine learning techniques for counterfactual prediction. Our results indicate that in the period 2013-2016 the CPS lowered emissions by 6.2 percent at an average cost of € 18 per ton. We find substantial temporal heterogeneity in tax-induced impacts which stems from variation in relative fuel prices. An important implication for climate policy is that a higher carbon tax does not necessarily lead to higher emissions reductions or higher costs.
    Keywords: Climate Policy, Carbon Tax, Carbon Pricing, Electricity, Coal, Natural Gas, United Kingdom, Carbon Price Surcharge, Policy Evaluation, Causal Inference, Machine Learning
    JEL: C54 Q48 Q52 Q58 L94
    Date: 2019–04
  7. By: Justin Sirignano (UIUC - University of Illinois at Urbana Champaign - University of Illinois at Urbana-Champaign [Urbana]); Rama Cont (LPSM UMR 8001 - Laboratoire de Probabilités, Statistique et Modélisation - UPMC - Université Pierre et Marie Curie - Paris 6 - UPD7 - Université Paris Diderot - Paris 7 - CNRS - Centre National de la Recherche Scientifique)
    Abstract: Using a large-scale Deep Learning approach applied to a high-frequency database containing billions of electronic market quotes and transactions for US equities, we uncover nonparametric evidence for the existence of a universal and stationary price formation mechanism relating the dynamics of supply and demand for a stock, as revealed through the order book, to subsequent variations in its market price. We assess the model by testing its out-of-sample predictions for the direction of price moves given the history of price and order flow, across a wide range of stocks and time periods. The universal price formation model exhibits a remarkably stable out-of-sample prediction accuracy across time, for a wide range of stocks from different sectors. Interestingly, these results also hold for stocks which are not part of the training sample, showing that the relations captured by the model are universal and not asset-specific. The universal model — trained on data from all stocks — outperforms, in terms of out-of-sample prediction accuracy, asset-specific linear and nonlinear models trained on time series of any given stock, showing that the universal nature of price formation weighs in favour of pooling together financial data from various stocks, rather than designing asset-or sector-specific models as commonly done. Standard data normal-izations based on volatility, price level or average spread, or partitioning the training data into sectors or categories such as large/small tick stocks, do not improve training results. On the other hand, inclusion of price and order flow history over many past observations improves forecasting performance, showing evidence of path-dependence in price dynamics.
    Date: 2018–03–30
  8. By: Kania, Elsa B.
    Abstract: Will the Chinese military succeed in advancing new frontiers of defense innovation? China has already emerged as a powerhouse in artificial intelligence and quantum technologies. The continued advances in these dual-use technologies may be leveraged for military applications pursuant to a national strategy of military-civil fusion. At this point, the trajectory of technological developments is uncertain, and considerable challenges remain to the actualization of deeper fusion of China’s defense and commercial sectors. However, if successful, China’s ambitions to lead in these strategic technologies could enable it to pioneer new paradigms of military power.
    Keywords: Social and Behavioral Sciences, China, defense innovation, artificial intelligence, AI, quantum technology, dual use, military-civil fusion
    Date: 2018–05–30
  9. By: Caferra, Rocco; Morone, Andrea
    Abstract: The purpose of this paper is to investigate the linkage between the perceived intergenerational mobility and the preferences for tax payment. Unfortunately, we do not have a unique dataset, however missing data might be predicted by employing di�erent methods. We compare the efficiency of k-nearest-neighbors (kNN), Random Forest (RF) and Tobit-2-sample-2-Stage (T2S2S) techniques in predicting the perceived inter- generational mobility, hence we exploit the predicted values to estimate the relation with tax morale. Results provide evidence of a strong negative relation between perceived mobility and tax cheating, suggesting that fairness in tax payment has also to be seen on the light of the perceived efficiency of the welfare state in providing more opportunities across generations.
    Keywords: intergenerational mobility; tax morale; missing data;
    JEL: D63 I31 I32
    Date: 2019–04
  10. By: Crowley, Frank; Doran, Justin
    Abstract: Future automation and artificial intelligence technologies are expected to have a major impact on the labour market. Despite the growing literature in the area of automation and the risk it poses to employment, there is very little analysis which considers the sub-national geographical implications of automation risk. This paper makes a number of significant contributions to the existing nascent field of regional differences in the spatial distribution of the job risk of automation. Firstly, we deploy the automation risk methodology developed by Frey and Osborne (2017) at a national level using occupational and sector data and apply a novel regionalisation disaggregation method to identify the proportion of jobs at risk of automation across the 200 towns of Ireland, which have a population of 1,500 or more using data from the 2016 census. This provides imputed values of automation risk across Irish towns. Secondly, we employ an economic geography framework to examine what types of local place characteristics are most likely to be associated with high risk towns while also considering whether the automation risk of towns has a spatial pattern across the Irish urban landscape. We find that the automation risk of towns is mainly explained by population differences, education levels, age demographics, the proportion of creative occupations in the town, town size and differences in the types of industries across towns. The impact of automation in Ireland is going to be felt far and wide, with two out of every five jobs at high risk of automation. The analysis found that many at high risk towns have at low risk nearby towns and many at low risk towns have at high risk neighbours. The analysis also found that there are also some concentrations of at lower risk towns and separately, concentrations of at higher risk towns. Our results suggest that the pattern of job risk from automation across Ireland demands policy that is not one size fits all, rather a localised, place-based, bottom up approach to policy intervention.
    Date: 2019
  11. By: Brighton, Henry
    Abstract: If we reassess the rationality question under the assumption that the uncertainty of the natural world is largely unquantifiable, where do we end up? In this article the author argues that we arrive at a statistical, normative, and cognitive theory of ecological rationality. The main casualty of this rebuilding process is optimality. Once we view optimality as a formal implication of quantified uncertainty rather than an ecologically meaningful objective, the rationality question shifts from being axiomatic/probabilistic in nature to being algorithmic/ predictive in nature. These distinct views on rationalitymirror fundamental and longstanding divisions in statistics.
    Keywords: cognitive science,rationality,ecological rationality,bounded rationality,bias bias,bias/variance dilemma,Bayesianism,machine learning,pattern recognition,decision making under uncertainty,unquantifiable uncertainty
    JEL: A12 B4 C1 C44 C52 C53 C63 D18
    Date: 2019
  12. By: Pauline Affeldt
    Abstract: I study the predictability of the EC’s merger decision procedure before and after the 2004 merger policy reform based on a dataset covering all affected markets of mergers with an official decision documented by DG Comp between 1990 and 2014. Using the highly flexible, non-parametric random forest algorithm to predict DG Comp’s assessment of competitive concerns in markets affected by a merger, I find that the predictive performance of the random forests is much better than the performance of simple linear models. In particular, the random forests do much better in predicting the rare event of competitive concerns. Secondly, postreform, DG Comp seems to base its assessment on a more complex interaction of merger and market characteristics than pre-reform. The highly flexible random forest algorithm is able to detect these potentially complex interactions and, therefore, still allows for high prediction precision.
    Keywords: Merger policy reform, DG Competition, Prediction, Random Forests
    JEL: K21 L40
    Date: 2019
  13. By: Bleemer, Zachary
    Abstract: In what ways—and to what degree—have universities contributed to the long-run growth, health, economic mobility, and gender/ethnic equity of their students’ communities and home states? The University of California ClioMetric History Project (UC-CHP), based at the Center for Studies in Higher Education, extends prior research on this question in two ways. First, we have developed a novel digitization protocol—formatted optical character recognition (fOCR)—which transforms scanned structured and semi-structured texts like university directories and catalogs into high-quality computer-readable databases. We use fOCR to produce annual databases of students (1890s to 1940s), faculty (1900 to present), course descriptions (1900 to present), and detailed budgets (1911-2012) for many California universities. Digitized student records, for example, illuminate the high proportion of 1900s university students who were female and from rural areas, as well as large family income differences between male and female students and between students at public and private universities. Second, UC-CHP is working to photograph, process with fOCR, and analyze restricted student administrative records to construct a comprehensive database of California university students and their enrollment behavior. This paper describes UC-CHP’s methodology and provides technical documentation for the project, while also presenting examples of the range of data the project is exploring and prospects for future research.
    Keywords: Education, Social and Behavioral Sciences, History of Higher Education, Big Data, Natural Language Processing, University of California
    Date: 2018–02–01
  14. By: Bing Yu; Xiaojing Xing; Agus Sudjianto
    Abstract: As is known, an option price is a solution to a certain partial differential equation (PDE) with terminal conditions (payoff functions). There is a close association between the solution of PDE and the solution of a backward stochastic differential equation (BSDE). We can either solve the PDE to obtain option prices or solve its associated BSDE. Recently a deep learning technique has been applied to solve option prices using the BSDE approach. In this approach, deep learning is used to learn some deterministic functions, which are used in solving the BSDE with terminal conditions. In this paper, we extend the deep-learning technique to solve a PDE with both terminal and boundary conditions. In particular, we will employ the technique to solve barrier options using Brownian motion bridges.
    Date: 2019–04

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.