nep-big New Economics Papers
on Big Data
Issue of 2020‒01‒13
39 papers chosen by
Tom Coupé
University of Canterbury

  1. Terror and Tourism: The Economic Consequences of Media Coverage By Timothy Besley; Thiemo Fetzer; Hannes Mueller
  2. Will robots automate your job away? Full employment, basic income, and economic democracy By McGaughey, Ewan
  3. Understanding the Great Recession Using Machine Learning Algorithms By Rickard Nyman; Paul Ormerod
  4. Valid simultaneous inference in high-dimensional settings (with the HDM package for R) By Philipp Bach; Victor Chernozhukov; Martin Spindler
  5. Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel Data By Xi Chen; Victor Chernozhukov; Ye Luo; Martin Spindler
  6. A Gated Recurrent Unit Approach to Bitcoin Price Prediction By Aniruddha Dutta; Saket Kumar; Meheli Basu
  7. FAIRNESS MEETS MACHINE LEARNING: SEARCHING FOR A BETTER BALANCE By Ekaterina Semenova; Ekaterina Perevoshchikova; Alexey Ivanov; Mikhail Erofeev
  8. Forecasting Bitcoin closing price series using linear regression and neural networks models By Nicola Uras; Lodovica Marchesi; Michele Marchesi; Roberto Tonelli
  9. How do machine learning and non-traditional data affect credit scoring? New evidence from a Chinese fintech firm By Leonardo Gambacorta; Yiping Huang; Han Qiu; Jingyi Wang
  10. Predicting intraday jumps in stock prices using liquidity measures and technical indicators By Ao Kong; Hongliang Zhu; Robert Azencott
  11. Deep Learning for Decision Making and the Optimization of Socially Responsible Investments and Portfolio By Nhi N.Y.Vo; Xue-Zhong He; Shaowu Liu; Guandong Xu
  12. Alternative personal data governance models By Greshake Tzovaras, Bastian; Ball, Mad Price
  13. Big Data, Data Science and Emerging Analytic tools : Impact in social science By Saha, Satabdi; Maiti, Tapabrata
  14. Estimation and HAC-based Inference for Machine Learning Time Series Regressions By Andrii Babii; Eric Ghysels; Jonas Striaukas
  15. Corporate default forecasting with machine learning By Mirko Moscatelli; Simone Narizzano; Fabio Parlapiano; Gianluca Viggiano
  16. Attribution of Customers’ Actions Based on Machine Learning Approach By Kadyrov, Timur; Ignatov, Dmitry I.
  17. Uniform inference in high-dimensional Gaussian graphical models By Sven Klaassen; Jannis Kück; Martin Spindler; Victor Chernozhukov
  18. The Comparison of Methods for IndividualTreatment Effect Detection By Semenova, Daria; Temirkaeva, Maria
  19. AI and Robotics Innovation: a Sectoral and Geographical Mapping using Patent Data By Van Roy, Vincent; Vertesy, Daniel; Damioli, Giacomo
  20. Spatial Information and the Legibility of Urban Form: Big Data in Urban Morphology By Boeing, Geoff
  21. A Consistently Oriented Basis for Eigenanalysis By Jay Damask
  22. Efficient Algorithms for Constructing Multiplex Networks Embedding By Zolnikov, Pavel; Zubov, Maxim; Nikitinsky, Nikita; Makarov, Ilya
  23. Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium By Bart Cockx; Michael Lechner; Joost Bollens
  24. The Dynamics of Non-Performing Loans during Banking Crises: A New Database By Anil Ari; Sophia Chen; Lev Ratnovski
  25. Optimal Data Collection for Randomized Control Trials By Pedro Carneiro; Sokbae (Simon) Lee; Daniel Wilhelm
  26. Inference on average treatment effects in aggregate panel data settings By Victor Chernozhukov; Kaspar Wüthrich; Yinchu Zhu
  27. General Game Playing B-to-B Price Negotiations By Michael, Friedrich; Ignatov, Dmitry I.
  28. VAT tax gap prediction: a 2-steps Gradient Boosting approach By Giovanna Tagliaferri; Daria Scacciatelli; Pierfrancesco Alaimo Di Loro
  29. The behavioral economics of artificial intelligence: Lessons from experiments with computer players By March, Christoph
  30. Sanction or Financial Crisis? An Artificial Neural Network-Based Approach to model the impact of oil price volatility on Stock and industry indices By Somayeh Kokabisaghi; Mohammadesmaeil Ezazi; Reza Tehrani; Nourmohammad Yaghoubi
  31. The Brazilian Amazon’s Double Reversal of Fortune By Burgess, Robin; Costa, Francisco J M; Olken, Ben
  32. Shedding Light on the Shadow Economy: A Global Database and the Interaction with the Official One By Leandro Medina; Friedrich Schneider
  33. A new approach to Early Warning Systems for small European banks By Bräuning, Michael; Malikkidou, Despo; Scricco, Giorgio; Scalone, Stefano
  34. On the Stability and Growth Pact compliance: what is predictable with machine learning? By Kea BARET; Theophilos PAPADIMITRIOU
  35. Implications of Automation for Global Migration By Yixiao ZHOU; Rod TYERS
  36. Prior Knowledge Neural Network for Automatic Feature Construction in Financial Time Series By Jie Fang; Jianwu Lin; Yong Jiang; Shutao Xia
  37. News-based Sentiment Indicators By Chengyu Huang; Sean Simpson; Daria Ulybina; Agustin Roitman
  38. Banking Supervision, Monetary Policy and Risk-Taking: Big Data Evidence from 15 Credit Registers By Carlo Altavilla; Miguel Boucinha; José-Luis Peydró; Frank Smets
  39. Forecasting Implied Volatility Smile Surface via Deep Learning and Attention Mechanism By Shengli Chen; Zili Zhang

  1. By: Timothy Besley; Thiemo Fetzer; Hannes Mueller
    Abstract: This paper studies the economic effects of news-coverage of violent events. To do so, we combine monthly aggregated and anonymized credit card data on tourism spending from 114 origin countries and 5 tourist destinations (Turkey, Egypt, Tunisia, Israel and Morocco) with a large corpus of more than 446 thousand newspaper articles covering news on the 5 destination countries from a subset of 57 tourist origin countries. We document that violent events in a destination are followed by sharp spikes in negative reporting at origin and contractions in tourist activity. Media coverage of violence has a large independent effect on tourist spending beyond what can be accounted for by controlling for the incidence of violence. We develop a model in which tourist beliefs, actual violence and media reporting are modelled together. This model allows us to quantify the effect of violent events and reporting.
    Keywords: terror, armed violence, tourism, media reports, economic integration, supervised machine learning, random forest
    JEL: D83 F14 D74 L82 F15 H12
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:bge:wpaper:1141&r=all
  2. By: McGaughey, Ewan (King's College, London)
    Abstract: Will the internet, robotics and artificial intelligence mean a ‘jobless future’? A recent narrative, endorsed by prominent tech-billionaires, says we face mass unemployment, and we need a basic income. In contrast, this article shows why the law can achieve full employment with fair incomes, and holidays with pay. Universal human rights, including the right to ‘share in scientific advancement and its benefits’, set the proper guiding principles. Three distinct views of the causes of unemployment are that it is a ‘natural’ phenomenon, that technology may propel it, or that it is social and legal choice: to let capital owners restrict investment in jobs. Only the third view has any credible evidence to support it. Technology may create redundancies, but unemployment is an entirely social phenomenon. After World War Two, 42% of UK jobs were redundant but social policy maintained full employment, and it can be done again. This said, transition to new technology, when markets are left alone, can be exceedingly slow: a staggering 88% of American horses lost their jobs after the Model T Ford, but only over 45 years. Taking lessons from history, it is clear that unemployment is driven by inequality of wealth and of votes in the economy. To uphold human rights, governments should reprogramme the law, for full employment, fair incomes and reduced working time, on a living planet. Robot owners will not automate your job away, if we defend economic democracy.
    Date: 2019–10–15
    URL: http://d.repec.org/n?u=RePEc:osf:lawarx:udbj8&r=all
  3. By: Rickard Nyman; Paul Ormerod
    Abstract: Nyman and Ormerod (2017) show that the machine learning technique of random forests has the potential to give early warning of recessions. Applying the approach to a small set of financial variables and replicating as far as possible a genuine ex ante forecasting situation, over the period since 1990 the accuracy of the four-step ahead predictions is distinctly superior to those actually made by the professional forecasters. Here we extend the analysis by examining the contributions made to the Great Recession of the late 2000s by each of the explanatory variables. We disaggregate private sector debt into its household and non-financial corporate components. We find that both household and non-financial corporate debt were key determinants of the Great Recession. We find a considerable degree of non-linearity in the explanatory models. In contrast, the public sector debt to GDP ratio appears to have made very little contribution. It did rise sharply during the Great Recession, but this was as a consequence of the sharp fall in economic activity rather than it being a cause. We obtain similar results for both the United States and the United Kingdom.
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2001.02115&r=all
  4. By: Philipp Bach (Institute for Fiscal Studies); Victor Chernozhukov (Institute for Fiscal Studies and MIT); Martin Spindler (Institute for Fiscal Studies)
    Abstract: Due to the increasing availability of high-dimensional empirical applications in many research disciplines, valid simultaneous inference becomes more and more important. For instance, high-dimensional settings might arise in economic studies due to very rich data sets with many potential covariates or in the analysis of treatment heterogeneities. Also the evaluation of potentially more complicated (non-linear) functional forms of the regression relationship leads to many potential variables for which simultaneous inferential statements might be of interest. Here we provide a review of classical and modern methods for simultaneous inference in (high-dimensional) settings and illustrate their use by a case study using the R package hdm. The R package hdm implements valid joint powerful and e?cient hypothesis tests for a potentially large number of coe?cients as well as the construction of simultaneous con?dence intervals and, therefore, provides useful methods to perform valid post-selection inference based on the LASSO. R and the package hdm are open-source software projects and can be freely downloaded from CRAN: http://cran.r-project.org.
    Date: 2019–06–12
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:30/19&r=all
  5. By: Xi Chen; Victor Chernozhukov; Ye Luo; Martin Spindler
    Abstract: In this paper we develop a data-driven smoothing technique for high-dimensional and non-linear panel data models. We allow for individual specific (non-linear) functions and estimation with econometric or machine learning methods by using weighted observations from other individuals. The weights are determined by a data-driven way and depend on the similarity between the corresponding functions and are measured based on initial estimates. The key feature of such a procedure is that it clusters individuals based on the distance / similarity between them, estimated in a first stage. Our estimation method can be combined with various statistical estimation procedures, in particular modern machine learning methods which are in particular fruitful in the high-dimensional case and with complex, heterogeneous data. The approach can be interpreted as a \textquotedblleft soft-clustering\textquotedblright\ in comparison to traditional\textquotedblleft\ hard clustering\textquotedblright that assigns each individual to exactly one group. We conduct a simulation study which shows that the prediction can be greatly improved by using our estimator. Finally, we analyze a big data set from didichuxing.com, a leading company in transportation industry, to analyze and predict the gap between supply and demand based on a large set of covariates. Our estimator clearly performs much better in out-of-sample prediction compared to existing linear panel data estimators.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.12867&r=all
  6. By: Aniruddha Dutta; Saket Kumar; Meheli Basu
    Abstract: In today's era of big data, deep learning and artificial intelligence have formed the backbone for cryptocurrency portfolio optimization. Researchers have investigated various state of the art machine learning models to predict Bitcoin price and volatility. Machine learning models like recurrent neural network (RNN) and long short-term memory (LSTM) have been shown to perform better than traditional time series models in cryptocurrency price prediction. However, very few studies have applied sequence models with robust feature engineering to predict future pricing. in this study, we investigate a framework with a set of advanced machine learning methods with a fixed set of exogenous and endogenous factors to predict daily Bitcoin prices. We study and compare different approaches using the root mean squared error (RMSE). Experimental results show that gated recurring unit (GRU) model with recurrent dropout performs better better than popular existing models. We also show that simple trading strategies, when implemented with our proposed GRU model and with proper learning, can lead to financial gain.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.11166&r=all
  7. By: Ekaterina Semenova (National Research University Higher School of Economics); Ekaterina Perevoshchikova (National Research University Higher School of Economics); Alexey Ivanov (National Research University Higher School of Economics); Mikhail Erofeev (Lomonosov Moscow State University)
    Abstract: Machine learning (ML) affects nearly every aspect of our lives, including the weightiest ones such as criminal justice. As it becomes more widespread, however, it raises the question of how we can integrate fairness into ML algorithms to ensure that all citizens receive equal treatment and to avoid imperiling society’s democratic values. In this paper we study various formal definitions of fairness that can be embedded into ML algorithms and show that the root cause of most debates about AI fairness is society’s lack of a consistent understanding of fairness generally. We conclude that AI regulations stipulating an abstract fairness principle are ineffective societally. Capitalizing on extensive related work in computer science and the humanities, we present an approach that can help ML developers choose a formal definition of fairness suitable for a particular country and application domain. Abstract rules from the human world fail in the ML world and ML developers will never be free from criticism if the status quo remains. We argue that the law should shift from an abstract definition of fairness to a formal legal definition. Legislators and society as a whole should tackle the challenge of defining fairness, but since no definition perfectly matches the human sense of fairness, legislators must publicly acknowledge the drawbacks of the chosen definition and assert that the benefits outweigh them. Doing so creates transparent standards of fairness to ensure that technology serves the values and best interests of society
    Keywords: Artificial Intelligence; Bias; Fairness; Machine Learning; Regulation; Values; Antidiscrimination Law;
    JEL: K19
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:hig:wpaper:93/law/2019&r=all
  8. By: Nicola Uras; Lodovica Marchesi; Michele Marchesi; Roberto Tonelli
    Abstract: This paper studies how to forecast daily closing price series of Bitcoin, using data on prices and volumes of prior days. Bitcoin price behaviour is still largely unexplored, presenting new opportunities. We compared our results with two modern works on Bitcoin prices forecasting and with a well-known recent paper that uses Intel, National Bank shares and Microsoft daily NASDAQ closing prices spanning a 3-year interval. We followed different approaches in parallel, implementing both statistical techniques and machine learning algorithms. The SLR model for univariate series forecast uses only closing prices, whereas the MLR model for multivariate series uses both price and volume data. We applied the ADF -Test to these series, which resulted to be indistinguishable from a random walk. We also used two artificial neural networks: MLP and LSTM. We then partitioned the dataset into shorter sequences, representing different price regimes, obtaining best result using more than one previous price, thus confirming our regime hypothesis. All the models were evaluated in terms of MAPE and relativeRMSE. They performed well, and were overall better than those obtained in the benchmarks. Based on the results, it was possible to demonstrate the efficacy of the proposed methodology and its contribution to the state-of-the-art.
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2001.01127&r=all
  9. By: Leonardo Gambacorta; Yiping Huang; Han Qiu; Jingyi Wang
    Abstract: This paper compares the predictive power of credit scoring models based on machine learning techniques with that of traditional loss and default models. Using proprietary transaction-level data from a leading fintech company in China for the period between May and September 2017, we test the performance of different models to predict losses and defaults both in normal times and when the economy is subject to a shock. In particular, we analyse the case of an (exogenous) change in regulation policy on shadow banking in China that caused lending to decline and credit conditions to deteriorate. We find that the model based on machine learning and non-traditional data is better able to predict losses and defaults than traditional models in the presence of a negative shock to the aggregate credit supply. One possible reason for this is that machine learning can better mine the non-linear relationship between variables in a period of stress. Finally, the comparative advantage of the model that uses the fintech credit scoring technique based on machine learning and big data tends to decline for borrowers with a longer credit history.
    Keywords: fintech, credit scoring, non-traditional information, machine learning, credit risk
    JEL: G17 G18 G23 G32
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:bis:biswps:834&r=all
  10. By: Ao Kong; Hongliang Zhu; Robert Azencott
    Abstract: Predicting the intraday stock jumps is a significant but challenging problem in finance. Due to the instantaneity and imperceptibility characteristics of intraday stock jumps, relevant studies on their predictability remain limited. This paper proposes a data-driven approach to predict intraday stock jumps using the information embedded in liquidity measures and technical indicators. Specifically, a trading day is divided into a series of 5-minute intervals, and at the end of each interval, the candidate attributes defined by liquidity measures and technical indicators are input into machine learning algorithms to predict the arrival of a stock jump as well as its direction in the following 5-minute interval. Empirical study is conducted on the level-2 high-frequency data of 1271 stocks in the Shenzhen Stock Exchange of China to validate our approach. The result provides initial evidence of the predictability of jump arrivals and jump directions using level-2 stock data as well as the effectiveness of using a combination of liquidity measures and technical indicators in this prediction. We also reveal the superiority of using random forest compared to other machine learning algorithms in building prediction models. Importantly, our study provides a portable data-driven approach that exploits liquidity and technical information from level-2 stock data to predict intraday price jumps of individual stocks.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.07165&r=all
  11. By: Nhi N.Y.Vo; Xue-Zhong He (Finance Discipline Group, University of Technology Sydney); Shaowu Liu; Guandong Xu
    Abstract: A socially responsible investment portfolio takes into consideration the environmental, social and governance aspects of companies. It has become an emerging topic for both financial investors and researchers recently. Traditional investment and portfolio theories, which are used for the optimization of financial investment portfolios, are inadequate for decision-making and the construction of an optimized socially responsible investment portfolio. In response to this problem, we introduced a Deep Responsible Investment Portfolio (DRIP) model that contains a Multivariate Bidirectional Long Short-Term Memory neural network, to predict stock returns for the construction of a socially responsible investment portfolio. The deep reinforcement learning technique was adapted to retrain neural networks and rebalance the portfolio periodically. Our empirical data revealed that the DRIP framework could achieve competitive financial performance and better social impact compared to traditional portfolio models, sustainable indexes and funds.
    Keywords: Socially responsible investment; Portfolio optimization; Multivariate analytics; Deep reinforcement learning; Decision support systems
    Date: 2019–01–01
    URL: http://d.repec.org/n?u=RePEc:uts:ppaper:2019-3&r=all
  12. By: Greshake Tzovaras, Bastian; Ball, Mad Price
    Abstract: The not-so-secret ingredient that underlies all successful Artificial Intelligence / Machine Learning (AI/ML) methods is training data. There would be no facial recognition, no targeted advertisements and no self-driving cars if it was not for large enough data sets with which those algorithms have been trained to perform their tasks. Given how central these data sets are, important ethics questions arise: How is data collection performed? And how do we govern its' use? This chapter – part of a forthcoming book – looks at why new data governance strategies are needed; investigates the relation of different data governance models to historic consent approaches; and compares different implementations of personal data exchange models.
    Date: 2019–12–26
    URL: http://d.repec.org/n?u=RePEc:osf:metaar:bthj7&r=all
  13. By: Saha, Satabdi; Maiti, Tapabrata
    Abstract: Rapid advancement of the Internet and Internet of Things have led to companies generating gigantic volumes of data in every field of business. Big data research has thus become one of the most prominent topic of discussion garnering simultaneous attention from academia and industry. This paper attempts to understand the significance of big data in current scientific research and outline its unique characteristics, otherwise unavailable from traditional data sources. We focus on how big data has altered the scope and dimension of data science thus making it severely interdisciplinary. We further discuss the significance and opportunities of big data in the domain of social science research with a scrutiny of the challenges previously faced while using smaller datasets. Given the extensive utilization of big data analytics in all forms of socio-technical research, we argue the need to critically interrogate its assumptions and biases; thereby advocating the need for creating a just and ethical big data world.
    Date: 2019–12–29
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:ft27y&r=all
  14. By: Andrii Babii; Eric Ghysels; Jonas Striaukas
    Abstract: Time series regression analysis in econometrics typically involves a framework relying on a set of mixing conditions to establish consistency and asymptotic normality of parameter estimates and HAC-type estimators of the residual long-run variances to conduct proper inference. This article introduces structured machine learning regressions for high-dimensional time series data using the aforementioned commonly used setting. To recognize the time series data structures we rely on the sparse-group LASSO estimator. We derive a new Fuk-Nagaev inequality for a class of $\tau$-dependent processes with heavier than Gaussian tails, nesting $\alpha$-mixing processes as a special case, and establish estimation, prediction, and inferential properties, including convergence rates of the HAC estimator for the long-run variance based on LASSO residuals. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that the text data can be a useful addition to more traditional numerical data.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.06307&r=all
  15. By: Mirko Moscatelli (Bank of Italy); Simone Narizzano (Bank of Italy); Fabio Parlapiano (Bank of Italy); Gianluca Viggiano (Bank of Italy)
    Abstract: We analyze the performance of a set of machine learning (ML) models in predicting default risk, using standard statistical models, such as the logistic regression, as a benchmark. When only a limited information set is available, for example in the case of financial indicators, we find that ML models provide substantial gains in discriminatory power and precision compared with statistical models. This advantage diminishes when high quality information, such as credit behavioral indicators obtained from the Credit Register, is also available, and becomes negligible when the dataset is small. We also evaluate the consequences of using an ML-based rating system on the supply of credit and the number of borrowers gaining access to credit. ML models channel a larger share of credit towards safer and larger borrowers and result in lower credit losses for lenders.
    Keywords: Credit Scoring, Machine Learning, Random Forest, Gradient Boosting Machine
    JEL: G2 C52 C55 D83
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:bdi:wptemi:td_1256_19&r=all
  16. By: Kadyrov, Timur; Ignatov, Dmitry I.
    Abstract: A multichannel attribution model based on gradient boost-ing over trees is proposed, which was compared with the state of theart models: bagged logistic regression, Markov chains approach, shapelyvalue. Experiments on digital advertising datasets showed that the pro-posed model is better than the solutions considered by ROC AUC metric.In addition, the problem of probability prediction of conversion by theconsumer using the ensemble of the analyzed algorithms was solved,the meta-features obtained were enriched with consumers and offlineactivities of the advertising campaign data.
    Keywords: Multi-touch attribution; Gradient boosting; Digital advertising; Data-driven marketing
    JEL: C45 M31
    Date: 2019–09–23
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:97312&r=all
  17. By: Sven Klaassen (Institute for Fiscal Studies); Jannis Kück (Institute for Fiscal Studies); Martin Spindler (Institute for Fiscal Studies); Victor Chernozhukov (Institute for Fiscal Studies and MIT)
    Abstract: Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters d being possible much larger than sample size. This is in particular important when certain features or structures of a causal model should be recovered. Our results highlight how in high-dimensional settings graphical models can be estimated and recovered with modern machine learning methods in complex data sets. To construct simultaneous con?dence regions on many target parameters, su?ciently fast estimation rates of the nuisance functions are crucial. In this context, we establish uniform estimation rates and sparsity guarantees of the square-root estimator in a random design under approximate sparsity conditions that might be of independent interest for related problems in high-dimensions. We also demonstrate in a comprehensive simulation study that our procedure has good small sample properties.
    Date: 2019–06–12
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:29/19&r=all
  18. By: Semenova, Daria; Temirkaeva, Maria
    Abstract: Today, treatment effect estimation at the individual level isa vital problem in many areas of science and business. For example, inmarketing, estimates of the treatment effect are used to select the mostefficient promo-mechanics; in medicine, individual treatment effects areused to determine the optimal dose of medication for each patient and soon. At the same time, the question on choosing the best method, i.e., themethod that ensures the smallest predictive error (for instance, RMSE)or the highest total (average) value of the effect, remains open. Accord-ingly, in this paper we compare the effectiveness of machine learningmethods for estimation of individual treatment effects. The comparisonis performed on the Criteo Uplift Modeling Dataset. In this paper weshow that the combination of the Logistic Regression method and theDifference Score method as well as Uplift Random Forest method pro-vide the best correctness of Individual Treatment Effect prediction onthe top 30% observations of the test dataset.
    Keywords: Individual Treatment Effect; ITE; Machine Learning; Random Forest; XGBoost; SVM·Random; Experiments; A/B testing; Uplift Random Forest
    JEL: C10 M30
    Date: 2019–09–23
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:97309&r=all
  19. By: Van Roy, Vincent; Vertesy, Daniel; Damioli, Giacomo
    Abstract: Economic activities based on the invention, production and distribution of artificial intelligence (AI) technologies have recently emerged worldwide. Yet, little is known about the innovative activities, location and growth performance of AI innovators. This chapter aims to map and analyse the global innovative landscape of AI by exploring 155,000 patents identified as AI-related by means of text-mining techniques. It highlights the emergence and evolution of AI technologies and identifies AI hotspots across the world. It explores the scale and pervasiveness of AI activities across sectors, and evaluates the economic performance of AI innovators using firm accounting information. Finally, it assesses recent trends in venture capital investments towards AI as financial support to promising AI startups. Findings of this chapter reveal a tremendous increase in AI patenting activities since 2013 with a significant boom in 2015-2016. While most of AI patenting activities remain concentrated in the sectors of software programming and manufacturing of electronic equipment and machinery, there are clear signs of cross-fertilisation towards (non-tech) sectors. The market of AI patenting firms is very vibrant and characterised by a large increase of new and small players with economic performances above industry average. This trend is also reflected by the recent increase in venture capital towards AI startups.
    Keywords: Artificial intelligence,innovation,patents,robotics
    JEL: O31 O33
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:zbw:glodps:433&r=all
  20. By: Boeing, Geoff (Northeastern University)
    Abstract: Urban planning and morphology have relied on analytical cartography and visual communication tools for centuries to illustrate spatial patterns, propose designs, compare alternatives, and engage the public. Classic urban form visualizations – from Giambattista Nolli’s ichnographic maps of Rome to Allan Jacobs’s figure-ground diagrams of city streets – have compressed physical urban complexity into easily comprehensible information artifacts. Today we can enhance these traditional workflows through the Smart Cities paradigm of understanding cities via user-generated content and harvested data in an information management context. New spatial technology platforms and big data offer new lenses to understand, evaluate, monitor, and manage urban form and evolution. This paper builds on the theoretical framework of visual cultures in urban planning and morphology to introduce and situate computational data science processes for exploring urban fabric patterns and spatial order. It demonstrates these workflows with OSMnx and data from OpenStreetMap, a collaborative spatial information system and mapping platform, to examine street network patterns, orientations, and configurations in different study sites around the world, considering what these reveal about the urban fabric. The age of ubiquitous urban data and computational toolkits opens up a new era of worldwide urban form analysis from integrated quantitative and qualitative perspectives.
    Date: 2019–10–01
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:vhrdc&r=all
  21. By: Jay Damask
    Abstract: Repeated application of machine-learning, eigen-centric methods to an evolving dataset reveals that eigenvectors calculated by well-established computer implementations are not stable along an evolving sequence. This is because the sign of any one eigenvector may point along either the positive or negative direction of its associated eigenaxis, and for any one eigen call the sign does not matter when calculating a solution. This work reports an algorithm that creates a consistently oriented basis of eigenvectors. The algorithm postprocesses any well-established eigen call and is therefore agnostic to the particular implementation of the latter. Once consistently oriented, directional statistics can be applied to the eigenvectors in order to track their motion and summarize their dispersion. When a consistently oriented eigensystem is applied to methods of machine-learning, the time series of training weights becomes interpretable in the context of the machine-learning model. Ordinary linear regression is used to demonstrate such interpretability. A reference implementation of the algorithm reported herein has been written in Python and is freely available, both as source code and through the thucyd Python package.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.12983&r=all
  22. By: Zolnikov, Pavel; Zubov, Maxim; Nikitinsky, Nikita; Makarov, Ilya
    Abstract: Network embedding has become a very promising techniquein analysis of complex networks. It is a method to project nodes of anetwork into a low-dimensional vector space while retaining the structureof the network based on vector similarity. There are many methods ofnetwork embedding developed for traditional single layer networks. Onthe other hand, multilayer networks can provide more information aboutrelationships between nodes. In this paper, we present our random walkbased multilayer network embedding and compare it with single layerand multilayer network embeddings. For this purpose, we used severalclassic datasets usually used in network embedding experiments and alsocollected our own dataset of papers and authors indexed in Scopus.
    Keywords: Network embedding; Multi-layer network; Machine learning on graphs
    JEL: C45 I20
    Date: 2019–09–23
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:97310&r=all
  23. By: Bart Cockx; Michael Lechner; Joost Bollens
    Abstract: We investigate heterogenous employment effects of Flemish training programmes. Based on administrative individual data, we analyse programme effects at various aggregation levels using Modified Causal Forests (MCF), a causal machine learning estimator for multiple programmes. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and types of unemployed. Simulations show that assigning unemployed to programmes that maximise individual gains as identified in our estimation can considerably improve effectiveness. Simplified rules, such as one giving priority to unemployed with low employability, mostly recent migrants, lead to about half of the gains obtained by more sophisticated rules.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.12864&r=all
  24. By: Anil Ari; Sophia Chen; Lev Ratnovski
    Abstract: This paper presents a new dataset on the dynamics of non-performing loans (NPLs) during 88 banking crises since 1990. The data show similarities across crises during NPL build-ups but less so during NPL resolutions. We find a close relationship between NPL problems—elevated and unresolved NPLs—and the severity of post-crisis recessions. A machine learning approach identifies a set of pre-crisis predictors of NPL problems related to weak macroeconomic, institutional, corporate, and banking sector conditions. Our findings suggest that reducing pre-crisis vulnerabilities and promptly addressing NPL problems during a crisis are important for post-crisis output recovery.
    Date: 2019–12–06
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:19/272&r=all
  25. By: Pedro Carneiro (Institute for Fiscal Studies and University College London); Sokbae (Simon) Lee (Institute for Fiscal Studies and Columbia University and IFS); Daniel Wilhelm (Institute for Fiscal Studies and cemmap and UCL)
    Abstract: In a randomized control trial, the precision of an average treatment e?ect estimator and the power of the corresponding t-test can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. To design the experiment, a researcher needs to solve this tradeo? subject to her budget constraint. We show that this optimization problem is equivalent to optimally predicting outcomes by the covariates, which in turn can be solved using existing machine learning techniques using pre-experimental data such as other similar studies, a census, or a household survey. In two empirical applications, we show that our procedure can lead to reductions of up to 58% in the costs of data collection, or improvements of the same magnitude in the precision of the treatment e?ect estimator.
    Date: 2019–05–02
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:21/19&r=all
  26. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Kaspar Wüthrich (Institute for Fiscal Studies and UCSD); Yinchu Zhu (Institute for Fiscal Studies)
    Abstract: This paper studies inference on treatment effects in aggregate panel data settings with a single treated unit and many control units. We propose new methods for making inference on average treatment effects in settings where both the number of pre-treatment and the number of post-treatment periods are large. We use linear models to approximate the counterfactual mean outcomes in the absence of the treatment. The counterfactuals are estimated using constrained Lasso, an essentially tuning free regression approach that nests difference-in-differences and synthetic control as special cases. We propose a K-fold cross-fitting procedure to remove the bias induced by regularization. To avoid the estimation of the long run variance, we construct a self-normalized t-statistic. The test statistic has an asymptotically pivotal distribution (a student t-distribution with K - 1 degrees of freedom), which makes our procedure very easy to implement. Our approach has several theoretical advantages. First, it does not rely on any sparsity assumptions. Second, it is fully robust against misspecification of the linear model. Third, it is more efficient than difference-in-means and difference-in-differences estimators. The proposed method demonstrates an excellent performance in simulation experiments, and is taken to a data application, where we re-evaluate the economic consequences of terrorism.
    Date: 2019–06–12
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:32/19&r=all
  27. By: Michael, Friedrich; Ignatov, Dmitry I.
    Abstract: This papers discusses the scientific and practical perspectives of using general game playing in business-to-business price negotiations as a part of Procurement 4.0 revolution. The status quo of digital price negotiations software,which emerged from intuitive solutions to business goals and refereed to as electronic auctions in industry, is summarized in scientific context. Description of such aspects as auctioneers’ interventions, asymmetry among players and time-depended features reveals the nature of nowadays electronic auctions to be rather termed as price games. This paper strongly suggests general game playing as the crucial technology for automation of human rule setting in those games. Game theory, genetic programming, experimental economics and AI human player simulation are also discussed as satellite topics. SIDL-type game descriptions languages and their formal game theoretic foundations are presented.
    Keywords: Procurement 4.0; Artificial Intelligence; General Game Playing; Game Theory; Mechanism Design; Experimental Economics; Behavioral Eco-nomics; z-Tree; Cognitive Modeling; e-Auctions; barter double auction; B-to-B Price Negotiations; English Auction; Dutch auction; Sealed-Bid Auction; Industry 4.0
    JEL: C63 C72 C90 D04 D44
    Date: 2019–09–23
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:97313&r=all
  28. By: Giovanna Tagliaferri; Daria Scacciatelli; Pierfrancesco Alaimo Di Loro
    Abstract: Tax evasion is the illegal non-payment of taxes by individuals, corporations, and trusts. It results in a loss of state revenue that can undermine the effectiveness of government policies. One measure of tax evasion is the so-called tax gap: the difference between the income that should be reported to the tax authorities and the amount actually reported. However, economists lack a robust method for estimating the tax gap through a bottom-up approach based on fiscal audits. This is difficult because the declared tax base is available on the whole population but the income reported to the tax authorities is generally available only on a small, non-random sample of audited units. This induces a selection bias which invalidates standard statistical methods. Here, we use machine learning based on a 2-steps Gradient Boosting model, to correct for the selection bias without requiring any strong assumption on the distribution. We use our method to estimate the Italian VAT Gap related to individual firms based on information gathered from administrative sources. Our algorithm estimates the potential VAT turnover of Italian individual firms for the fiscal year 2011 and suggests that the tax gap is about 30% of the total potential tax base. Comparisons with other methods show our technique offers a significant improvement in predictive performance.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.03781&r=all
  29. By: March, Christoph
    Abstract: Artificial intelligence (AI) is starting to pervade the economic and social life rendering strategic interactions with artificial agents more and more common. At the same time, experimental economic research has increasingly employed computer players to advance our understanding of strategic interaction in general. What can this strand of research teach us about an AI-shaped future? I review 90 experimental studies using computer players. I find that, in a nutshell, humans act more selfishly and more rational in the presence of computer players, and they are often able to exploit these players. Still, many open questions prevail.
    Keywords: Experiment,Robots,Computer players,Survey
    JEL: C90 C92 O33
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:zbw:bamber:154&r=all
  30. By: Somayeh Kokabisaghi; Mohammadesmaeil Ezazi; Reza Tehrani; Nourmohammad Yaghoubi
    Abstract: Financial market in oil-dependent countries has been always influenced by any changes in international energy market, In particular, oil price.It is therefore of considerable interest to investigate the impact of oil price on financial markets. The aim of this paper is to model the impact of oil price volatility on stock and industry indices by considering gas and gold price,exchange rate and trading volume as explanatory variables. We also propose Feed-forward networks as an accurate method to model non-linearity. we use data from 2009 to 2018 that is split in two periods during international energy sanction and post-sanction. The results show that Feed-forward networks perform well in predicting variables and oil price volatility has a significant impact on stock and industry market indices. The result is more robust in the post-sanction period and global financial crisis in 2014. Herein, it is important for financial market analysts and policy makers to note which factors and when influence the financial market, especially in an oil-dependent country such as Iran with uncertainty in the international politics. This research analyses the results in two different periods, which is important in the terms of oil price shock and international energy sanction. Also, using neural networks in methodology gives more accurate and reliable results. Keywords: Feed-forward networks,Industry index,International energy sanction,Oil price volatility
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.04015&r=all
  31. By: Burgess, Robin; Costa, Francisco J M (FGV EPGE Brazilian School of Economics and Finance); Olken, Ben
    Abstract: We use high-resolution satellite data to determine how Amazonian deforestation changes discretely at the Brazilian international border. We document two dramatic reversals. In 2000, Brazilian pixels were 37 percent more likely to be deforested, and between 2001 and 2005 annual Brazilian deforestation was more than three times the rate observed across the border. In 2006, just after Brazil introduced policies to reduce deforestation, these differences disappear. However, from 2014, amid a period of economic crisis and deteriorating commitment to environmental regulation, Brazilian deforestation rates jump back up to near pre-reform levels. These results demonstrate the power of the state to affect whether wilderness ecosystems are conserved or exploited.
    Date: 2019–08–09
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:67xg5&r=all
  32. By: Leandro Medina; Friedrich Schneider
    Abstract: Using the multiple indicator-multiple cause (MIMIC) approach, this paper generates a novel global database by estimating the size of the shadow economy for 157 countries over 1991 to 2017. The results suggest that the OECD countries are by far the lowest with values below 20% of off official GDP and the shadow economy is larger in Latin America and Sub-Saharan Africa, averaging almost 38 and 39 percent of GDP, respectively. The average over all countries and over 1991 to 2017 is 30.9 %. What is really remarkable, that the average decline of the shadow economy from 1991 to 2017 is 6.8 percentage points. The shadow economy is particularly large in countries such as Bolivia (Georgia) with 62.9(61.7) percent of GDP, and low in countries such as Switzerland(United States) with 6.4 (7.6) percent of GDP, on average. Robustness tests include the use of satellite data on night lights intensity as a proxy for the size of countries’ economies and the comparison of the results with 23 countries national statistical offices’ measures of informality (discrepancy method used) demonstrate stable and similar results. Finally the interaction of the shadow economy with the official one is investigated. Theoretically the effect of the shadow economy on the official is an open question; first results of this interaction in Pakistan over 1976 to 2015 show a negative (positive) effect in the short (long) run.
    Keywords: shadow economy, informal economy, survey, multiple indicators multiple Causes (MIMIC), comparison of different estimation methods, the light intensity approach, shadow economy results for 157 countries, interaction of the shadow economy with the official
    JEL: C39 C51 C82 H11 H26
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_7981&r=all
  33. By: Bräuning, Michael; Malikkidou, Despo; Scricco, Giorgio; Scalone, Stefano
    Abstract: This paper describes a machine learning technique to timely identify cases of individual bank financial distress. Our work represents the first attempt in the literature to develop an early warning system specifically for small European banks. We employ a machine learning technique, and build a decision tree model using a dataset of official supervisory reporting, complemented with qualitative banking sector and macroeconomic variables. We propose a new and wider definition of financial distress, in order to capture bank distress cases at an earlier stage with respect to the existing literature on bank failures; by doing so, given the rarity of bank defaults in Europe we significantly increase the number of events on which to estimate the model, thus increasing the model precision; in this way we identify bank crises at an earlier stage with respect to the usual default definition, therefore leaving a time window for supervisory intervention. The Quinlan C5.0 algorithm we use to estimate the model also allows us to adopt a conservative approach to misclassification: as we deal with bank distress cases, we consider missing a distress event twice as costly as raising a false flag. Our final model comprises 12 variables in 19 nodes, and outperforms a logit model estimation, which we use to benchmark our analysis; validation and back testing also suggest that the good performance of our model is relatively stable and robust. JEL Classification: E58, C01, C50
    Keywords: bank distress, decision tree, machine learning, Quinlan
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:ecb:ecbwps:20192348&r=all
  34. By: Kea BARET; Theophilos PAPADIMITRIOU
    Abstract: The aim of the paper is to propose simplest advanced indicators to prevent internal imbalances in European Union. The paper also highlights that new methods coming from Machine Learning field could be appropriate to forecast fiscal policy outcomes, instead of traditionnal econometrics approaches. The Stability and Growth Pact (SGP) and especially the 3% limit sets on the fiscal balance purpose to coordinate fiscal policies of the European Union member states and ensure debt sustainability. The Macroeconomic Imbalance Procedure (MIP) scoreboard introduced by the European Commission aims to verify the good conduct of public finances. We propose an analysis of the determinants of the SGP compliance by the 28 European Union members between 2006 ans 2018, through a Support Vector Machine model. More than testing if the MIP scoreboard variables really matter to forecast the risk of unsustainability, we also test a set of macroeconomic, monetary, and financial variables and apply a prior feature selection model which highlights the best predictors. We give some proofs that main primary indicators of the MIP scoreboard are not useful for SGP compliance forecast and we propose new variables to forecast the European Union supranational fiscal rule compliance.
    Keywords: Fiscal Rules; Stability and Growth Pact, Forecasting, Machine learning.
    JEL: E61 H11 H61 H62
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:ulp:sbbeta:2019-48&r=all
  35. By: Yixiao ZHOU (Crawford School of Public Policy, Australian National University); Rod TYERS (Business School, University of Western Australia; Research School of Economics and Centre for Applied Macroeconomic Analysis (CAMA), Australian National University)
    Abstract: Relative wages and the share of total value added accruing to low-skill workers have declined during the past three decades, among both OECD countries and large developing countries. The primary beneficiary until recently has been skill, the supply of which has risen as education investment has increased. The rise in artificial intelligence (AI)-driven automation suggests that declines in value added shares accruing to low-skill workers will continue. Indeed, AI-driven automation creates an impulse for diminished labor market performance by low-skill workers throughout the world but most prominently in high-fertility, relatively youthful regions with comparatively strong growth in low-skill labor forces. The implied bias against such regions will therefore enhance emigration pressure. This paper offers a preliminary analysis of these effects. Central to the paper is a model of the global economy that includes general demography and real wage sensitive bilateral migration behavior, which is used to help quantify potential future growth in real wage disparities and the extent, direction and content of associated migration flows. Overall, global wage inequality is increased by expanded skilled migration, primarily because of large increases in skilled wage premia that arise in developing regions of origin. Inter-regional divergences in skilled wages are reduced, however, due to the additional skilled labour market arbitrage opportunities offered by more open migration policies.
    Keywords: Automation, demographic change, migration incentives, labor markets and economic growth
    JEL: C68 E22 E27 F21 F43 J11
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:uwa:wpaper:19-19&r=all
  36. By: Jie Fang; Jianwu Lin; Yong Jiang; Shutao Xia
    Abstract: In quantitative finance, useful features are constructed by human experts. However, this method is of low efficient. Thus, automatic feature construction algorithms have received more and more attention. The SOTA technic in this field is to use reverse polish expression to represent the features, and then use genetic programming to reconstruct it. In this paper, we propose a new method, alpha discovery neural network, which can automatically construct features by using neural network. In this work, we made several contributions. Firstly, we put forward new object function by using empirical knowledge in financial signal processing, and we also fixed its undifferentiated problem. Secondly, we use model stealing technic to learn from other prior knowledge, which can bring enough diversity into our network. Thirdly, we come up with a method to measure the diversity of different financial features. Experiment shows that ADN can produce more diversified and higher informative features than GP. Besides, if we use GP's output to serve as prior knowledge, its final achievements will be significantly improved by using ADN.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.06236&r=all
  37. By: Chengyu Huang; Sean Simpson; Daria Ulybina; Agustin Roitman
    Abstract: We construct sentiment indices for 20 countries from 1980 to 2019. Relying on computational text analysis, we capture specific language like “fear”, “risk”, “hedging”, “opinion”, and, “crisis”, as well as “positive” and “negative” sentiments, in news articles from the Financial Times. We assess the performance of our sentiment indices as “news-based” early warning indicators (EWIs) for financial crises. We find that sentiment indices spike and/or trend up ahead of financial crises.
    Date: 2019–12–06
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:19/273&r=all
  38. By: Carlo Altavilla; Miguel Boucinha; José-Luis Peydró; Frank Smets
    Abstract: We analyse the effects of supranational versus national banking supervision on credit supply, and its interactions with monetary policy. For identification, we exploit: (i) a new, proprietary dataset based on 15 European credit registers; (ii) the institutional change leading to the centralisation of European banking supervision; (iii) high-frequency monetary policy surprises; (iv) differences across euro area countries, also vis-à-vis non-euro area countries. We show that supranational supervision reduces credit supply to firms with very high ex-ante and ex-post credit risk, while stimulating credit supply to firms without loan delinquencies. Moreover, the increased risk-sensitivity of credit supply driven by centralised supervision is stronger for banks operating in stressed countries. Exploiting heterogeneity across banks, we find that the mechanism driving the results is higher quantity and quality of human resources available to the supranational supervisor rather than changes in incentives due to the reallocation of supervisory responsibility to the new institution. Finally, there are crucial complementarities between supervision and monetary policy: centralised supervision offsets excessive bank risk-taking induced by a more accommodative monetary policy stance, but does not offset more productive risk-taking. Overall, we show that using multiple credit registers – first time in the literature – is crucial for external validity.
    Keywords: supervision, banking, AnaCredit, monetary policy, euro area crisis
    JEL: E51 E52 E58 G01 G21 G28
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:bge:wpaper:1137&r=all
  39. By: Shengli Chen; Zili Zhang
    Abstract: The implied volatility smile surface is the basis of option pricing, and the dynamic evolution of the option volatility smile surface is difficult to predict. In this paper, attention mechanism is introduced into LSTM, and a volatility surface prediction method combining deep learning and attention mechanism is pioneeringly established. LSTM's forgetting gate makes it have strong generalization ability, and its feedback structure enables it to characterize the long memory of financial volatility. The application of attention mechanism in LSTM networks can significantly enhance the ability of LSTM networks to select input features. The experimental results show that the two strategies constructed using the predicted implied volatility surfaces have higher returns and Sharpe ratios than that the volatility surfaces are not predicted. This paper confirms that the use of AI to predict the implied volatility surface has theoretical and economic value. The research method provides a new reference for option pricing and strategy.
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1912.11059&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.