nep-big New Economics Papers
on Big Data
Issue of 2020‒03‒23
thirty papers chosen by
Tom Coupé
University of Canterbury

  1. Terrorist Attacks, Cultural Incidents and the Vote for Radical Parties: Analyzing Text from Twitter By Francesco Giavazzi; Felix Iglhaut; Giacomo Lemoli; Gaia Rubera
  2. Applications of deep learning in stock market prediction: recent progress By Weiwei Jiang
  3. Forecasting Foreign Exchange Rate: A Multivariate Comparative Analysis between Traditional Econometric, Contemporary Machine Learning & Deep Learning Techniques By Manav Kaushik; A K Giri
  4. Optimizing Tax Administration Policies with Machine Learning By Pietro Battiston; Simona Gamba; Alessandro Santoro
  5. Application of Deep Neural Networks to assess corporate Credit Rating By Parisa Golbayani; Dan Wang; Ionut Florescu
  6. The More the Merrier? A Machine Learning Algorithm for Optimal Pooling of Panel Data By Marijn A. Bolhuis; Brett Rayner
  7. Deus ex Machina? A Framework for Macro Forecasting with Machine Learning By Marijn A. Bolhuis; Brett Rayner
  8. Non-stationary neural network for stock return prediction By Steven Y. K. Wong; Jennifer Chan; Lamiae Azizi; Richard Y. D. Xu
  9. AI Watch. Defining Artificial Intelligence. Towards an operational definition and taxonomy of artificial intelligence By Sofia Samoili; Montserrat Lopez Cobo; Emilia Gomez; Giuditta De Prato; Fernando Martinez-Plumed; Blagoj Delipetrev
  10. KryptoOracle: A Real-Time Cryptocurrency Price Prediction Platform Using Twitter Sentiments By Shubhankar Mohapatra; Nauman Ahmed; Paulo Alencar
  11. Machine Learning Treasury Yields By Zura Kakushadze; Willie Yu
  12. Adversarial Attacks on Machine Learning Systems for High-Frequency Trading By Micah Goldblum; Avi Schwarzschild; Naftali Cohen; Tucker Balch; Ankit B. Patel; Tom Goldstein
  13. Compras públicas y Big Data: El caso mexicano By Ana Thaís Martínez; Luis M. Torres
  14. The Evolution of Inequality of Opportunity in Germany: A Machine Learning Approach By Paolo Brunori; Guido Neidhofer
  15. A New Decomposition Ensemble Approach for Tourism Demand Forecasting: Evidence from Major Source Countries By Chengyuan Zhang; Fuxin Jiang; Shouyang Wang; Shaolong Sun
  16. Compras públicas y Big Data: Investigación en Chile sobre índice de riesgo de corrupción By Miguel Jorquera
  17. Environment and Development : Penalized Non-Parametric Inference of Global Trends in Deforestation, Pollution and Carbon By Andree,Bo Pieter Johannes; Spencer,Phoebe Girouard; Chamorro,Andres; Dogo,Harun
  18. Nota Técnica Regional: Compras públicas y Big Data By Pablo Montes M.
  19. AI Watch - National strategies on Artificial Intelligence: A European perspective in 2019 By Vincent Van Roy
  20. Deep Learning for Asset Bubbles Detection By Oksana Bashchenko; Alexis Marchal
  21. Deep Learning, Jumps, and Volatility Bursts By Oksana Bashchenko; Alexis Marchal
  22. Predictive intraday correlations in stable and volatile market environments: Evidence from deep learning By Ben Moews; Gbenga Ibikunle
  23. TES analysis of AI Worldwide Ecosystem in 2009-2018 By Sofia Samoili; Riccardo Righi; Melisande Cardona; Montserrat Lopez-Cobo; Miguel Vazquez-Prada Baillet; Giuditta De-Prato
  24. Estimation of Poverty in Somalia Using Innovative Methodologies By Pape,Utz Johann; Wollburg,Philip Randolph
  25. Talkin' Bout Cooperation By Özkes, Ali; Hanaki, Nobuyuki
  26. Robots and the Origin of Their Labour-Saving Impact By Montobbio, Fabio; Staccioli, Jacopo; Virgillito, Maria Enrica; Vivarelli, Marco
  27. A new hybrid approach for crude oil price forecasting: Evidence from multi-scale data By Yang Yifan; Guo Ju'e; Sun Shaolong; Li Yixin
  28. Transformers for Limit Order Books By James Wallbridge
  29. Central Bank Tone and the Dispersion of Views within Monetary Policy Committees By Paul Hubert; Fabien Labondance
  30. PDGM: a Neural Network Approach to Solve Path-Dependent Partial Differential Equations By Yuri F. Saporito; Zhaoyu Zhang

  1. By: Francesco Giavazzi; Felix Iglhaut; Giacomo Lemoli; Gaia Rubera
    Abstract: We study the role of perceived threats from cultural diversity induced by terrorist attacks and a salient criminal event on public discourse and voters' support for far-right parties. We first develop a rule which allocates Twitter users in Germany to electoral districts and then use a machine learning method to compute measures of textual similarity between the tweets they produce and tweets by accounts of the main German parties. Using the dates of the aforementioned exogenous events we estimate constituency-level shifts in similarity to party language. We find that following these events Twitter text becomes on average more similar to that of the main far-right party, AfD, while the opposite happens for some of the other parties. Regressing estimated shifts in similarity on changes in vote shares between federal elections we find a significant association. Our results point to the role of perceived threats on the success of nationalist parties.
    JEL: C45 D72 H56
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:26825&r=all
  2. By: Weiwei Jiang
    Abstract: Stock market prediction has been a classical yet challenging problem, with the attention from both economists and computer scientists. With the purpose of building an effective prediction model, both linear and machine learning tools have been explored for the past couple of decades. Lately, deep learning models have been introduced as new frontiers for this topic and the rapid development is too fast to catch up. Hence, our motivation for this survey is to give a latest review of recent works on deep learning models for stock market prediction. We not only category the different data sources, various neural network structures, and common used evaluation metrics, but also the implementation and reproducibility. Our goal is to help the interested researchers to synchronize with the latest progress and also help them to easily reproduce the previous studies as baselines. Base on the summary, we also highlight some future research directions in this topic.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.01859&r=all
  3. By: Manav Kaushik; A K Giri
    Abstract: In todays global economy, accuracy in predicting macro-economic parameters such as the foreign the exchange rate or at least estimating the trend correctly is of key importance for any future investment. In recent times, the use of computational intelligence-based techniques for forecasting macroeconomic variables has been proven highly successful. This paper tries to come up with a multivariate time series approach to forecast the exchange rate (USD/INR) while parallelly comparing the performance of three multivariate prediction modelling techniques: Vector Auto Regression (a Traditional Econometric Technique), Support Vector Machine (a Contemporary Machine Learning Technique), and Recurrent Neural Networks (a Contemporary Deep Learning Technique). We have used monthly historical data for several macroeconomic variables from April 1994 to December 2018 for USA and India to predict USD-INR Foreign Exchange Rate. The results clearly depict that contemporary techniques of SVM and RNN (Long Short-Term Memory) outperform the widely used traditional method of Auto Regression. The RNN model with Long Short-Term Memory (LSTM) provides the maximum accuracy (97.83%) followed by SVM Model (97.17%) and VAR Model (96.31%). At last, we present a brief analysis of the correlation and interdependencies of the variables used for forecasting.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.10247&r=all
  4. By: Pietro Battiston; Simona Gamba; Alessandro Santoro
    Abstract: Tax authorities around the world are increasingly employing data mining and machine learning algorithms to predict individual behaviours. Although the traditional literature on optimal tax administration provides useful tools for ex-post evaluation of policies, it disregards the problem of which taxpayers to target. This study identifies and characterises a loss function that assigns a social cost to any prediction-based policy. We define such measure as the difference between the social welfare of a given policy and that of an ideal policy unaffected by prediction errors. We show how this loss function shares a relationship with the receiver operating characteristic curve, a standard statistical tool used to evaluate prediction performance. Subsequently, we apply our measure to predict inaccurate tax returns issued by self-employed and sole proprietorships in Italy. In our application, a random forest model provides the best prediction: we show how it can be interpreted using measures of variable importance developed in the machine learning literature.
    Keywords: policy prediction problems, tax behaviour, big data, machine learning
    JEL: H26 H32 C53
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:mib:wpaper:436&r=all
  5. By: Parisa Golbayani; Dan Wang; Ionut Florescu
    Abstract: Recent literature implements machine learning techniques to assess corporate credit rating based on financial statement reports. In this work, we analyze the performance of four neural network architectures (MLP, CNN, CNN2D, LSTM) in predicting corporate credit rating as issued by Standard and Poor's. We analyze companies from the energy, financial and healthcare sectors in US. The goal of the analysis is to improve application of machine learning algorithms to credit assessment. To this end, we focus on three questions. First, we investigate if the algorithms perform better when using a selected subset of features, or if it is better to allow the algorithms to select features themselves. Second, is the temporal aspect inherent in financial data important for the results obtained by a machine learning algorithm? Third, is there a particular neural network architecture that consistently outperforms others with respect to input features, sectors and holdout set? We create several case studies to answer these questions and analyze the results using ANOVA and multiple comparison testing procedure.
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.02334&r=all
  6. By: Marijn A. Bolhuis; Brett Rayner
    Abstract: We leverage insights from machine learning to optimize the tradeoff between bias and variance when estimating economic models using pooled datasets. Specifically, we develop a simple algorithm that estimates the similarity of economic structures across countries and selects the optimal pool of countries to maximize out-of-sample prediction accuracy of a model. We apply the new alogrithm by nowcasting output growth with a panel of 102 countries and are able to significantly improve forecast accuracy relative to alternative pools. The algortihm improves nowcast performance for advanced economies, as well as emerging market and developing economies, suggesting that machine learning techniques using pooled data could be an important macro tool for many countries.
    Date: 2020–02–28
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:20/44&r=all
  7. By: Marijn A. Bolhuis; Brett Rayner
    Abstract: We develop a framework to nowcast (and forecast) economic variables with machine learning techniques. We explain how machine learning methods can address common shortcomings of traditional OLS-based models and use several machine learning models to predict real output growth with lower forecast errors than traditional models. By combining multiple machine learning models into ensembles, we lower forecast errors even further. We also identify measures of variable importance to help improve the transparency of machine learning-based forecasts. Applying the framework to Turkey reduces forecast errors by at least 30 percent relative to traditional models. The framework also better predicts economic volatility, suggesting that machine learning techniques could be an important part of the macro forecasting toolkit of many countries.
    Date: 2020–02–28
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:20/45&r=all
  8. By: Steven Y. K. Wong (University of Technology Sydney); Jennifer Chan (University of Sydney); Lamiae Azizi (University of Sydney); Richard Y. D. Xu (University of Technology Sydney)
    Abstract: We consider the problem of neural network training in a time-varying context. Machine learning algorithms have excelled in problems that do not change over time. However, problems encountered in financial markets are often non-stationary. We propose the online early stopping algorithm and show that a neural network trained using this algorithm can track a function changing with unknown dynamics. We applied the proposed algorithm to the stock return prediction problem studied in Gu et al. (2019) and achieved mean rank correlation of 4.69%, almost twice as high as the expanding window approach. We also show that prominent factors, such as the size effect and momentum, exhibit time varying stock return predictiveness.
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.02515&r=all
  9. By: Sofia Samoili (European Commission - JRC); Montserrat Lopez Cobo (European Commission - JRC); Emilia Gomez (European Commission - JRC); Giuditta De Prato (European Commission - JRC); Fernando Martinez-Plumed (Universitat Politecnica de Valencia); Blagoj Delipetrev (European Commission - JRC)
    Abstract: This report proposes an operational definition of artificial intelligence to be adopted in the context of AI Watch, the Commission knowledge service to monitor the development, uptake and impact of artificial intelligence for Europe. The definition, which will be used as a basis for the AI Watch monitoring activity, is established by means of a flexible scientific methodology that allows regular revision. The operational definition is constituted by a concise taxonomy and a list of keywords that characterise the core domains of the AI research field, and transversal topics such as applications of the former or ethical and philosophical considerations, in line with the wider monitoring objective of AI Watch. The AI taxonomy is designed to inform the AI landscape analysis and will expectedly detect AI applications in neighbour technological domains such as robotics (in a broader sense), neuroscience or internet of things. The starting point to develop the operational definition is the definition of AI adopted by the High Level Expert Group on artificial intelligence. To derive this operational definition we have followed a mixed methodology. On one hand, we apply natural language processing methods to a large set of AI literature. On the other hand, we carry out a qualitative analysis on 55 key documents including artificial intelligence definitions from three complementary perspectives: policy, research and industry. A valuable contribution of this work is the collection of definitions developed between 1955 and 2019, and the summarisation of the main features of the concept of artificial intelligence as reflected in the relevant literature.
    Keywords: artificial intelligence, ai watch, ai definition, ai taxonomy, ai keywords, industry, market, research
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc118163&r=all
  10. By: Shubhankar Mohapatra; Nauman Ahmed; Paulo Alencar
    Abstract: Cryptocurrencies, such as Bitcoin, are becoming increasingly popular, having been widely used as an exchange medium in areas such as financial transaction and asset transfer verification. However, there has been a lack of solutions that can support real-time price prediction to cope with high currency volatility, handle massive heterogeneous data volumes, including social media sentiments, while supporting fault tolerance and persistence in real time, and provide real-time adaptation of learning algorithms to cope with new price and sentiment data. In this paper we introduce KryptoOracle, a novel real-time and adaptive cryptocurrency price prediction platform based on Twitter sentiments. The integrative and modular platform is based on (i) a Spark-based architecture which handles the large volume of incoming data in a persistent and fault tolerant way; (ii) an approach that supports sentiment analysis which can respond to large amounts of natural language processing queries in real time; and (iii) a predictive method grounded on online learning in which a model adapts its weights to cope with new prices and sentiments. Besides providing an architectural design, the paper also describes the KryptoOracle platform implementation and experimental evaluation. Overall, the proposed platform can help accelerate decision-making, uncover new opportunities and provide more timely insights based on the available and ever-larger financial data volume and variety.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.04967&r=all
  11. By: Zura Kakushadze; Willie Yu
    Abstract: We give explicit algorithms and source code for extracting factors underlying Treasury yields using (unsupervised) machine learning (ML) techniques, such as nonnegative matrix factorization (NMF) and (statistically deterministic) clustering. NMF is a popular ML algorithm (used in computer vision, bioinformatics/computational biology, document classification, etc.), but is often misconstrued and misused. We discuss how to properly apply NMF to Treasury yields. We analyze the factors based on NMF and clustering and their interpretation. We discuss their implications for forecasting Treasury yields in the context of out-of-sample ML stability issues.
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.05095&r=all
  12. By: Micah Goldblum; Avi Schwarzschild; Naftali Cohen; Tucker Balch; Ankit B. Patel; Tom Goldstein
    Abstract: Algorithmic trading systems are often completely automated, and deep learning is increasingly receiving attention in this domain. Nonetheless, little is known about the robustness properties of these models. We study valuation models for algorithmic trading from the perspective of adversarial machine learning. We introduce new attacks specific to this domain with size constraints that minimize attack costs. We further discuss how these attacks can be used as an analysis tool to study and evaluate the robustness properties of financial models. Finally, we investigate the feasibility of realistic adversarial attacks in which an adversarial trader fools automated trading systems into making inaccurate predictions.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.09565&r=all
  13. By: Ana Thaís Martínez; Luis M. Torres
    Abstract: El objetivo de este estudio es explorar y proponer nuevas formas de utilizar una política de datos adecuada para las contrataciones públicas. La mayoría de los gobiernos cuenta con plataformas electrónicas de compras públicas, que además de funcionar como un espacio transaccional entre gobierno y proveedores, generan información detallada de cada proceso de compra que permite a los gobiernos evaluar el desempeño de sus sistemas de contrataciones. Un análisis de la información contenida en dichas plataformas puede ser instrumental para diagnosticar problemas de competencia en las compras públicas, señalar irregularidades en los procedimientos y encontrar espacios de discrecionalidad susceptibles a actos de corrupción. Asimismo, los datos estructurados del historial de compras pueden ser utilizados para encontrar ciclos y patrones que permitan una mejor planeación o que evidencien deficiencias en la programación de adquisiciones. El tipo de diagnósticos que permiten generar los datos estructurados pueden ser la pieza faltante para proponer y diseñar intervenciones de política pública necesarias para asegurar las mejores condiciones de compra para el Estado, procedimientos apegados a la ley y un ejercicio del gasto público más transparente, competido y con menos oportunidades para actos de corrupción. Por lo tanto, es necesario comprender, en un primer momento, qué es Big Data y cómo se puede utilizar desde plataformas gubernamentales. Y en un segundo momento analizar ejemplos de cómo este enfoque ha sido utilizado en el país. El documento se divide de la siguiente manera. En la primera sección describe de manera general qué son y porqué pueden ser útiles los grandes volúmenes de datos generados en los sistemas de contrataciones públicas. En la segunda sección hace un recuento de las condiciones necesarias para poder llevar a cabo la explotación de Big Data con la información generada a través de estos sistemas de compras. Para la tercera sección realiza un diagnóstico de la implementación o existencia de las condiciones necesarias para utilizar datos de los sistemas de contrataciones en México. La cuarta sección contiene ejemplos de avances que se han dado en México para el uso de Big Data en contrataciones públicas. Por último, presenta las conclusiones y recomendaciones de política pública.
    Keywords: Corrupción, Compra Pública, Contratación Pública, Big Data, Política Pública, México, Corruption, Big Data, Public Policy, Mexico
    JEL: D73 F43 H57 H83 L38 O10
    Date: 2019–07–24
    URL: http://d.repec.org/n?u=RePEc:col:000124:017928&r=all
  14. By: Paolo Brunori (University of Florence); Guido Neidhofer (ZEW - Leibniz Centre for European Economic Research)
    Abstract: We show that measures of inequality of opportunity (IOP) fully consistent with Roemer (1998)’s IOP theory can be straightforwardly estimated by adopting a machine learning approach, and apply our novel method to analyse the development of IOP in Germany during the last three decades. Hereby, we take advantage of information contained in 25 waves of the Socio-Economic Panel. Our analysis shows that in Germany IOP declined immediately after reunification, increased in the first decade of the century, and slightly declined again after 2010. Over the entire period, at the top of the distribution we always find individuals that resided in West-Germany before the fall of the Berlin Wall, whose fathers had a high occupational position, and whose mothers had a high educational degree. East-German residents in 1989, with low educated parents, persistently qualify at the bottom.
    JEL: D63 D30 D31
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:dls:wpaper:0259&r=all
  15. By: Chengyuan Zhang; Fuxin Jiang; Shouyang Wang; Shaolong Sun
    Abstract: The Asian-pacific region is the major international tourism demand market in the world, and its tourism demand is deeply affected by various factors. Previous studies have shown that different market factors influence the tourism market demand at different timescales. Accordingly, the decomposition ensemble learning approach is proposed to analyze the impact of different market factors on market demand, and the potential advantages of the proposed method on forecasting tourism demand in the Asia-pacific region are further explored. This study carefully explores the multi-scale relationship between tourist destinations and the major source countries, by decomposing the corresponding monthly tourist arrivals with noise-assisted multivariate empirical mode decomposition. With the China and Malaysia as case studies, their respective empirical results show that decomposition ensemble approach significantly better than the benchmarks which include statistical model, machine learning and deep learning model, in terms of the level forecasting accuracy and directional forecasting accuracy.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.09201&r=all
  16. By: Miguel Jorquera
    Abstract: De acuerdo al Consejo Asesor Presidencial contra los Conflictos de Intereses, el Tráfico de Influencia y la Corrupción del año 2015 1: “Las compras públicas son un aspecto clave dentro de la administración del Estado y representan un factor determinante en la calidad de los servicios que éste entrega y en la infraestructura que provee. Es así que el buen funcionamiento del sistema compromete el interés general y por ello es de vital importancia que en sus mecanismos y procesos se asegure la transparencia y eficiencia, se promueva la competencia y se minimicen los riesgos de corrupción.” (Consejo Asesor Presidencial, 2015). Cada vez más el sistema de compras públicas es considerado como una herramienta de gobernanza estratégica para los gobiernos. Sin embargo, este sector presenta características que lo hacen vulnerable al fenómeno de la corrupción y aquello pone en riesgo todos los potenciales beneficios que puede producir en distintas áreas de la economía. Aquello ha planteado la necesidad de buscar constantemente formas de perfeccionar su funcionamiento, lo que se ha visto reflejado en el desarrollo de estudios por parte de distintos actores, los cuales han avanzado en diversas propuestas. Una de ellas ha sido hacer uso de la big data que se origina en las plataformas electrónicas de compras públicas para determinar potenciales riesgos de corrupción. En este sentido, el sistema de compras en Chile no es ajeno a la corrupción, por lo que, siguiendo las recomendaciones de la literatura, este documento estudia el sistema de compras públicas en Chile a partir del uso de datos como herramienta en la lucha anti corrupción. Para ello, se utiliza la información producida por la plataforma electrónica denominada Mercado Público, siguiendo la metodología diseñada por el Instituto Mexicano para la Competitividad (IMCO). Esta metodología se basa en dos pasos secuenciales, siendo el primero la búsqueda de patrones de riesgo a partir de ejercicios exploratorios sobre compradores, proveedores y tipos de contratación, mientras que el segundo es la construcción de un índice de riesgo de corrupción basado en indicadores asociados a competencia, transparencia y anomalías o violaciones a la ley.
    Keywords: Corrupción, Compra Pública, Contratación Pública, Big Data, Índice de Riesgo de Corrupción, Política Pública, Chile, Corruption, Public Procurement, Big Data, Public Policy, Chile
    JEL: D73 F43 H57 H83 L38 O10
    Date: 2019–07–24
    URL: http://d.repec.org/n?u=RePEc:col:000124:017924&r=all
  17. By: Andree,Bo Pieter Johannes; Spencer,Phoebe Girouard; Chamorro,Andres; Dogo,Harun
    Abstract: This paper revisits the issue of environment and development raised in the 1992 World Development Report, with new analysis tools and data. The paper discusses inference and interpretation in a machine learning framework. The results suggest that production gradually favors conserving the earth's resources as gross domestic product increases, but increased efficiency alone is not sufficient to offset the effects of growth in scale. Instead, structural change in the economy shapes environmental outcomes across GDP. The analysis finds that average development is associated with an inverted $U$-shape in deforestation, pollution, and carbon intensities. Per capita emissions follow a $J$-curve. Specifically, poverty reduction occurs alongside degrading local environments and higher income growth poses a global burden through carbon. Local economic structure further determines the shape, amplitude, and location of tipping points of the Environmental Kuznets Curve. The models are used to extrapolate environmental output to 2030. The daunting implications of continued development are a reminder that immediate and sustained global efforts are required to mitigate forest loss, improve air quality, and shift the global economy to a 2°pathway.
    Keywords: Global Environment,Inequality,Environmental Disasters&Degradation,Common Carriers Industry,Food&Beverage Industry,Plastics&Rubber Industry,Business Cycles and Stabilization Policies,Textiles, Apparel&Leather Industry,Pulp&Paper Industry,Construction Industry,General Manufacturing,Nutrition
    Date: 2019–02–25
    URL: http://d.repec.org/n?u=RePEc:wbk:wbrwps:8756&r=all
  18. By: Pablo Montes M.
    Abstract: El Laboratorio Latinoamericano de Políticas de Probidad y Transparencia tiene como principal objetivo la generación de una agenda de investigación aplicada que contribuya a promover políticas de integridad en América Latina. El Laboratorio está integrado por cuatro centros de investigación: la Fundación para la Educación Superior y Desarrollo (FEDESARROLLO) de Colombia, el Instituto Mexicano para la competitividad (IMCO), el Centro de Sistemas Públicos de la Universidad de Chile, y Espacio Público también con sede en Chile. Esta iniciativa es apoyada por el Banco Interamericano de Desarrollo (BID) mediante una cooperación técnica regional que financia la institucionalización del Laboratorio, así como el desarrollo y difusión de su agenda de investigación. El Laboratorio tiene contemplada la realización de estudios en torno a tres principales temas: “Compras públicas y Big data”, “Homologación y comparación de compras públicas” y “Lecciones de casos emblemáticos de corrupción”. El presente documento tiene como objetivo sintetizar y presentar los hallazgos derivados de las investigaciones que cada centro de investigación realizó en su país en la temática de “Compras públicas y Big data” con la intención de presentar conclusiones y recomendaciones a nivel regional. El contenido del documento proviene en su mayoría de las respectivas investigaciones, el cual se presenta a manera de resumen para contextualizar y apoyar las recomendaciones formuladas.
    Keywords: Corrupción, Compra Pública, Big Data, Política Pública, Contratación Pública, América Latina, Chile, Colombia, México, Corruption, Public Policy, Big Data, Public Procurement, Latin America, Chile, Colombia, Mexico
    JEL: D73 F43 H57 H83 L38 O10
    Date: 2019–07–24
    URL: http://d.repec.org/n?u=RePEc:col:000124:017926&r=all
  19. By: Vincent Van Roy (European Commission - JRC)
    Abstract: One of the key priorities of the European Commission’s Coordinated Plan on AI is to encourage Member States to develop their national AI strategies by the end of 2019, outlining investments levels and implementation measures. The objective of this report is to present and gather information on all EU Member States' national AI strategies in a structured and comprehensive way. It aims to help Member States to compare their strategy and to identify areas for strengthening synergies and collaboration. Published national AI strategies are analysed to identify the most relevant policy areas and to develop a common AI Policy Framework that can be used for the presentation of policy initiatives. In this sense, this report follows a similar approach as used in the AI strategies, by presenting policy initiatives from a holistic perspective. To highlight the numerous economic and policy outlooks from which the transformative nature of AI can be explored, this report presents policy initiatives across various policy areas, including human capital (i.e. educational development), from the lab to the market (i.e. research & development, innovation, business and public sector development), networking (i.e. collaboration and dissemination), regulation (i.e. ethical guidelines, legislation and standardisation) and infrastructure (i.e. data and telecommunication infrastructure). In this endeavour, the European Commission notably collaborated with the OECD on collecting and analysing national strategies on Artificial Intelligence in the EU Member States. This report will be updated and released on an annual basis.
    Keywords: Industrial research and innovation, Financial and economic analysis, Digital Economy, ICT R&D and Innovation
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc119974&r=all
  20. By: Oksana Bashchenko (HEC Lausanne; Swiss Finance Institute); Alexis Marchal (EPFL; SFI)
    Abstract: We develop a methodology for detecting asset bubbles using a neural network. We rely on the theory of local martingales in continuous-time and use a deep network to estimate the diffusion coefficient of the price process more accurately than the current estimator, obtaining an improved detection of bubbles. We show the outperformance of our algorithm over the existing statistical method in a laboratory created with simulated data. We then apply the network classification to real data and build a zero net exposure trading strategy that exploits the risky arbitrage emanating from the presence of bubbles in the US equity market from 2006 to 2008. The profitability of the strategy provides an estimation of the economical magnitude of bubbles as well as support for the theoretical assumptions relied on.
    Keywords: Bubbles, Strict local martingales, High-frequency data, Deep learning, LSTM
    JEL: C22 C45 C58 G12
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:chf:rpseri:rp2008&r=all
  21. By: Oksana Bashchenko (HEC Lausanne; Swiss Finance Institute); Alexis Marchal (EPFL; SFI)
    Abstract: We develop a new method that detects jumps nonparametrically in financial time series and significantly outperforms the current benchmark on simulated data. We use a long short- term memory (LSTM) neural network that is trained on labelled data generated by a process that experiences both jumps and volatility bursts. As a result, the network learns how to disentangle the two. Then it is applied to out-of-sample simulated data and delivers results that considerably differ from the benchmark: we obtain fewer spurious detection and identify a larger number of true jumps. When applied to real data, our approach for jump screening allows to extract a more precise signal about future volatility.
    Keywords: Jumps, Volatility Burst, High-Frequency Data, Deep Learning, LSTM
    JEL: C14 C32 C45 C58 G17
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:chf:rpseri:rp2010&r=all
  22. By: Ben Moews; Gbenga Ibikunle
    Abstract: Standard methods and theories in finance can be ill-equipped to capture highly non-linear interactions in financial prediction problems based on large-scale datasets, with deep learning offering a way to gain insights into correlations in markets as complex systems. In this paper, we apply deep learning to econometrically constructed gradients to learn and exploit lagged correlations among S&P 500 stocks to compare model behaviour in stable and volatile market environments, and under the exclusion of target stock information for predictions. In order to measure the effect of time horizons, we predict intraday and daily stock price movements in varying interval lengths and gauge the complexity of the problem at hand with a modification of our model architecture. Our findings show that accuracies, while remaining significant and demonstrating the exploitability of lagged correlations in stock markets, decrease with shorter prediction horizons. We discuss implications for modern finance theory and our work's applicability as an investigative tool for portfolio managers. Lastly, we show that our model's performance is consistent in volatile markets by exposing it to the environment of the recent financial crisis of 2007/2008.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.10385&r=all
  23. By: Sofia Samoili (European Commission - JRC); Riccardo Righi (European Commission - JRC); Melisande Cardona (European Commission - JRC); Montserrat Lopez-Cobo (European Commission - JRC); Miguel Vazquez-Prada Baillet (European Commission - JRC); Giuditta De-Prato (European Commission - JRC)
    Abstract: This report analyses and compares countries and regions in the evolving international industrial and research landscape of Artificial Intelligence (AI). The evidence presented is based on a unique database covering the years 2009-2018. The database has been specifically built from a multitude of sources to provide scientific evidence and monitor the AI landscape worldwide. Companies, universities, research institutes and governmental authorities with an active role in AI are identified and analysed in an aggregated fashion. The report presents a wide variety of indicators, allowing us to expand our knowledge on issues such as: the size of the AI ecosystem globally and at country level; which are the main global competitors of the EU; what is the level of industrial involvement per country; what are the firms’ demographics, profiling of economic agents according to their strengths in innovation and take-up of AI, including their patenting performance; and the degree of internal and external collaborations between EU and non-EU firms and research institutions. The analysis of the AI activities developed by agents in the studied territories provides interesting insights on their areas of specialisation, highlighting the strengths of the EU and its Member States in the global landscape. Each section offers a focus on EU Member States.
    Keywords: artificial intelligence, techno-economic analysis, digital transformation, AI landscape, AI thematic area, network of collaborations, AI industry
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc120106&r=all
  24. By: Pape,Utz Johann; Wollburg,Philip Randolph
    Abstract: Somalia is highly data-deprived, leaving policy makers to operate in a statistical vacuum. To overcome this challenge, the World Bank implemented wave 2 of the Somali High Frequency Survey to better understand livelihoods and vulnerabilities and, especially, to estimate national poverty indicators. The specific context of insecurity and lack of statistical infrastructure in Somalia posed several challenges for implementing a household survey and measuring poverty. This paper outlines how these challenges were overcome in wave 2 of the Somali High Frequency Survey through methodological and technological adaptations in four areas. First, in the absence of a recent census, no exhaustive lists of census enumeration areas along with population estimates existed, creating challenges to derive a probability-based representative sample. Therefore, geospatial techniques and high-resolution imagery were used to model the spatial population distribution, build a probability-based population sampling frame, and generate enumeration areas to overcome the lack of a recent population census. Second, although some areas remained completely inaccessible due to insecurity, even most accessible areas held potential risks to the safety of field staff and survey respondents, so that time spent in these areas had to be minimized. To address security concerns, the survey adapted logistical arrangements, sampling strategy using micro-listing, and questionnaire design to limit time on the ground based on the Rapid Consumption Methodology. Third, poverty in completely inaccessible areas had to be estimated by other means. Therefore, the Somali High Frequency Survey relies on correlates derived from satellite imagery and other geo-spatial data to estimate poverty in such areas. Finally, the nonstationary nature of the nomadic population required special sampling strategies.
    Date: 2019–02–12
    URL: http://d.repec.org/n?u=RePEc:wbk:wbrwps:8735&r=all
  25. By: Özkes, Ali; Hanaki, Nobuyuki
    Abstract: We experimentally study the interaction of the effects of the strategic environment and com- munication on the observed levels of cooperation in two-person finitely repeated games with a Pareto-inefficient Nash equilibrium. We replicate previous findings that point to higher levels of tacit cooperation under strategic complementarity compared to strategic substitution. In our data, however, this is not due to differences in levels of reciprocity as suggested previously. Instead, we find that slow learning and noisy choices might drive this effect. When subjects are allowed to communicate in free-form online chat before making choices, cooperation levels increase significantly to the extent that the difference in the two strategic environments dis- appears. A machine-assisted natural language processing approach shows how the content of communication differs in the two strategic environments.
    Keywords: Communication, Cooperation, Reinforcement learning, Strategic environment, Structural topic modeling
    Date: 2020–03–04
    URL: http://d.repec.org/n?u=RePEc:wiw:wus055:7505&r=all
  26. By: Montobbio, Fabio (Università Cattolica del Sacro Cuore); Staccioli, Jacopo (Università Cattolica del Sacro Cuore); Virgillito, Maria Enrica (Università Cattolica del Sacro Cuore); Vivarelli, Marco (Università Cattolica del Sacro Cuore)
    Abstract: This paper investigates the presence of explicit labour-saving heuristics within robotic patents. It analyses innovative actors engaged in robotic technology and their economic environment (identity, location, industry), and identifies the technological fields particularly exposed to labour-saving innovations. It exploits advanced natural language processing and probabilistic topic modelling techniques on the universe of patent applications at the USPTO between 2009 and 2018, matched with ORBIS (Bureau van Dijk) firm-level dataset. The results show that labour-saving patent holders comprise not only robots producers, but also adopters. Consequently, labour-saving robotic patents appear along the entire supply chain. The paper shows that labour-saving innovations challenge manual activities (e.g. in the logistics sector), activities entailing social intelligence (e.g. in the healthcare sector) and cognitive skills (e.g. learning and predicting).
    Keywords: robotic patents, labour-saving technology, search heuristics, probabilistic topic models
    JEL: O33 J24 C38
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12967&r=all
  27. By: Yang Yifan; Guo Ju'e; Sun Shaolong; Li Yixin
    Abstract: Faced with the growing research towards crude oil price fluctuations influential factors following the accelerated development of Internet technology, accessible data such as Google search volume index are increasingly quantified and incorporated into forecasting approaches. In this paper, we apply multi-scale data that including both GSVI data and traditional economic data related to crude oil price as independent variables and propose a new hybrid approach for monthly crude oil price forecasting. This hybrid approach, based on divide and conquer strategy, consists of K-means method, kernel principal component analysis and kernel extreme learning machine , where K-means method is adopted to divide input data into certain clusters, KPCA is applied to reduce dimension, and KELM is employed for final crude oil price forecasting. The empirical result can be analyzed from data and method levels. At the data level, GSVI data perform better than economic data in level forecasting accuracy but with opposite performance in directional forecasting accuracy because of Herd Behavior, while hybrid data combined their advantages and obtain best forecasting performance in both level and directional accuracy. At the method level, the approaches with K-means perform better than those without K-means, which demonstrates that divide and conquer strategy can effectively improve the forecasting performance.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.09656&r=all
  28. By: James Wallbridge
    Abstract: We introduce a new deep learning architecture for predicting price movements from limit order books. This architecture uses a causal convolutional network for feature extraction in combination with masked self-attention to update features based on relevant contextual information. This architecture is shown to significantly outperform existing architectures such as those using convolutional networks (CNN) and Long-Short Term Memory (LSTM) establishing a new state-of-the-art benchmark for the FI-2010 dataset.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.00130&r=all
  29. By: Paul Hubert (Sciences Po-OFCE); Fabien Labondance (Université de Bourgogne Franche-Comté - CRESE - Sciences Po-OFCE)
    Abstract: Does policymakers’ choice of words matter? We explore empirically whether central bank tone conveyed in FOMC statements contains useful information for financial market participants. We quantify central bank tone using computational linguistics and identify exogenous shocks to central bank tone orthogonal to the state of the economy. Using an ARCH model and a high-frequency approach, we find that positive central bank tone increases interest rates at the 1- year maturity. We therefore investigate which potential pieces of information could be revealed by central bank tone. Our tests suggest that it relates to the dispersion of views among FOMC members. This information may be useful to financial markets to understand current and future policy decisions. Finally, we show that central bank tone helps predicting future policy decisions.
    Keywords: Optimism, FOMC, Dissent, Interest rate expectations, ECB
    JEL: E43 E52 E58
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:fce:doctra:2002&r=all
  30. By: Yuri F. Saporito; Zhaoyu Zhang
    Abstract: In this paper we propose a generalization of the Deep Galerking Method (DGM) of \cite{dgm} to deal with Path-Dependent Partial Differential Equations (PPDEs). These equations firstly appeared in the seminal work of \cite{fito_dupire}, where the functional It\^o calculus was developed to deal with path-dependent financial derivatives contracts. The method, which we call Path-Dependent DGM (PDGM), consists of using a combination of feed-forward and Long Short-Term Memory architectures to model the solution of the PPDE. We then analyze several numerical examples, many from the Financial Mathematics literature, that show the capabilities of the method under very different situations.
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.02035&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.