nep-big New Economics Papers
on Big Data
Issue of 2020‒07‒27
thirty-one papers chosen by
Tom Coupé
University of Canterbury

  1. How do machine learning and non-traditional data affect credit scoring? New evidence from a Chinese fintech firm By Gambacorta, Leonardo; Huang, Yiping; Qiu, Han; Wang, Jingyi
  2. Swag: A Wrapper Method for Sparse Learning By Roberto Molinari; Gaetan Bakalli; Stéphane Guerrier; Cesare Miglioli; Samuel Orso; O. Scaillet
  3. Deus ex Machina? A Framework for Macro Forecasting with Machine Learning By Marijn A. Bolhuis; Brett Rayner
  4. The More the Merrier? A Machine Learning Algorithm for Optimal Pooling of Panel Data By Marijn A. Bolhuis; Brett Rayner
  5. Market Efficiency in the Age of Big Data By Martin, Ian; Nagel, Stefan
  6. The Demand for AI Skills in the Labor Market By Alekseeva, Liudmila; Azar, José; Gine, Mireia; Samila, Sampsa; Taska, Bledi
  7. Forecasting volatility with a stacked model based on a hybridized Artificial Neural Network By E. Ramos-P\'erez; P. J. Alonso-Gonz\'alez; J. J. N\'u\~nez-Vel\'azquez
  8. Robust uncertainty sensitivity analysis By Daniel Bartl; Samuel Drapeau; Jan Obloj; Johannes Wiesel
  9. Data Governance, AI, and Trade: Asia as a Case Study By Susan Ariel Aaronson
  10. Real-time turning point indicators: Review of current international practices By Cyrille Lenoel; Garry Young
  11. An unsupervised deep learning approach in solving partial-integro differential equations By Ali Hirsa; Weilong Fu
  12. Merger Policy in Digital Markets: An Ex-Post Assessment By Argentesi, Elena; Buccirossi, Paolo; Calvano, Emilio; Duso, Tomaso; Marrazzo, Alessia; Nava, Salvatore
  13. The Macroeconomy as a Random Forest By Philippe Goulet Coulombe
  14. Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium By Boolens, Joost; Cockx, Bart; Lechner, Michael
  15. Die Vernetzung Wiens mit den Städten Europas By David Zenz
  16. Turbulence on the Global Economy influenced by Artificial Intelligence and Foreign Policy Inefficiencies By Kwadwo Osei Bonsu; Jie Song
  17. Taming the Factor Zoo: A Test of New Factors By Feng, Gavin; Giglio, Stefano W; Xiu, Dacheng
  18. Modeling Financial Time Series using LSTM with Trainable Initial Hidden States By Jungsik Hwang
  19. The Global Impact of Brexit Uncertainty By Hassan, Tarek Alexander; Hollander, Stephan; Tahoun, Ahmed; van Lent, Laurence
  20. Using Company Specific Headlines and Convolutional Neural Networks to Predict Stock Fluctuations By Jonathan Readshaw; Stefano Giani
  21. Dynamic Networks in Large Financial and Economic Systems By Jozef Barunik; Michael Ellington
  22. Algorithmic Collusion: Supra-competitive Prices via Independent Algorithms By Hansen, Karsten; Misra, Kanishka; Pai, Mallesh
  23. On the Causes and Consequences of Deviations from Rational Behavior By Dainis Zegners; Uwe Sunde; Anthony Strittmatter
  24. Media Attention vs. Sentiment as Drivers of Conditional Volatility Predictions: An Application to Brexit By Massimo Guidolin; Manuela Pedio
  25. Theoretical approaches to forecasting regional macro-indicators By Gorshkova, Taisiya (Горшкова, Таисия); Turuntseva, Marina (Турунцева, Марина)
  26. Where Should We Go? Internet Searches and Tourist Arrivals By Serhan Cevik
  27. Big-Data and Service Supply chain management: Challenges and opportunities By Badr Bentalha
  28. When are Google data useful to nowcast GDP? An approach via pre-selection and shrinkage By Laurent Ferrara; Anna Simoni
  29. Information, uncertainty and the manipulability of artifcial intelligence autonomous vehicles systems By Osório, António (António Miguel); Pinto, Alberto Adrego
  30. Opportunity Occupations and the Future of Work By Mels de Zeeuw
  31. Measuring Digital Development with Online Data: Digital Economies in Eastern Europe and Central Asia By Braesemann, Fabian; Stephany, Fabian

  1. By: Gambacorta, Leonardo; Huang, Yiping; Qiu, Han; Wang, Jingyi
    Abstract: This paper compares the predictive power of credit scoring models based on machine learning techniques with that of traditional loss and default models. Using proprietary transaction-level data from a leading fintech company in China for the period between May and September 2017, we test the performance of different models to predict losses and defaults both in normal times and when the economy is subject to a shock. In particular, we analyse the case of an (exogenous) change in regulation policy on shadow banking in China that caused lending to decline and credit conditions to deteriorate. We find that the model based on machine learning and non-traditional data is better able to predict losses and defaults than traditional models in the presence of a negative shock to the aggregate credit supply. One possible reason for this is that machine learning can better mine the non-linear relationship between variables in a period of stress. Finally, the comparative advantage of the model that uses the fintech credit scoring technique based on machine learning and big data tends to decline for borrowers with a longer credit history.
    Keywords: credit risk; credit scoring; Fintech; Machine Learning; non-traditional information
    JEL: G17 G18 G23 G32
    Date: 2019–12
  2. By: Roberto Molinari (Auburn University); Gaetan Bakalli (University of Geneva - Geneva School of Economics and Management); Stéphane Guerrier (University of Geneva - Geneva School of Economics and Management); Cesare Miglioli (University of Geneva - Geneva School of Economics and Management); Samuel Orso (University of Geneva - Geneva School of Economics and Management); O. Scaillet (University of Geneva GSEM and GFRI; Swiss Finance Institute; University of Geneva - Research Center for Statistics)
    Abstract: Predictive power has always been the main research focus of learning algorithms with the goal of minimizing the test error for supervised classification and regression problems. While the general approach for these algorithms is to consider all possible attributes in a dataset to best predict the response of interest, an important branch of research is focused on sparse learning in order to avoid overfitting which can greatly affect the accuracy of out-of-sample prediction. However, in many practical settings we believe that only an extremely small combination of different attributes affect the response whereas even sparse-learning methods can still preserve a high number of attributes in high-dimensional settings and possibly deliver inconsistent prediction performance. As a consequence, the latter methods can also be hard to interpret for researchers and practitioners, a problem which is even more relevant for the “black-box”-type mechanisms of many learning approaches. Finally, aside from needing to quantify prediction uncertainty, there is often a problem of replicability since not all data-collection procedures measure (or observe) the same attributes and therefore cannot make use of proposed learners for testing purposes. To address all the previous issues, we propose to study a procedure that combines screening and wrapper methods and aims to find a library of extremely low-dimensional attribute combinations (with consequent low data collection and storage costs) in order to (i) match or improve the predictive performance of any particular learning method which uses all attributes as an input (including sparse learners); (ii) provide a low-dimensional network of attributes easily interpretable by researchers and practitioners; and (iii) increase the potential replicability of results due to a diversity of attribute combinations defining strong learners with equivalent predictive power. We call this algorithm “Sparse Wrapper AlGorithm” (SWAG).
    Keywords: interpretable machine learning, big data, wrapper, sparse learning, meta learning, ensemble learning, greedy algorithm, feature selection, variable importance network
    JEL: C45 C51 C52 C53 C55 C87
    Date: 2020–06
  3. By: Marijn A. Bolhuis; Brett Rayner
    Abstract: We develop a framework to nowcast (and forecast) economic variables with machine learning techniques. We explain how machine learning methods can address common shortcomings of traditional OLS-based models and use several machine learning models to predict real output growth with lower forecast errors than traditional models. By combining multiple machine learning models into ensembles, we lower forecast errors even further. We also identify measures of variable importance to help improve the transparency of machine learning-based forecasts. Applying the framework to Turkey reduces forecast errors by at least 30 percent relative to traditional models. The framework also better predicts economic volatility, suggesting that machine learning techniques could be an important part of the macro forecasting toolkit of many countries.
    Keywords: Production growth;Capacity utilization;Economic growth;Stock markets;Emerging markets;Forecasts,Nowcasting,Machine learning,GDP growth,Cross-validation,Random Forest,Ensemble,Turkey.,WP,forecast error,factor model,predictor,forecast,OLS
    Date: 2020–02–28
  4. By: Marijn A. Bolhuis; Brett Rayner
    Abstract: We leverage insights from machine learning to optimize the tradeoff between bias and variance when estimating economic models using pooled datasets. Specifically, we develop a simple algorithm that estimates the similarity of economic structures across countries and selects the optimal pool of countries to maximize out-of-sample prediction accuracy of a model. We apply the new alogrithm by nowcasting output growth with a panel of 102 countries and are able to significantly improve forecast accuracy relative to alternative pools. The algortihm improves nowcast performance for advanced economies, as well as emerging market and developing economies, suggesting that machine learning techniques using pooled data could be an important macro tool for many countries.
    Keywords: Economic models;Production growth;Developing countries;Emerging markets;Data analysis;Machine learning,GDP growth,forecasts,panel data,pooling.,WP,forecast error,DGP,forecast,economic structure,output growth
    Date: 2020–02–28
  5. By: Martin, Ian; Nagel, Stefan
    Abstract: Modern investors face a high-dimensional prediction problem: thousands of observable variables are potentially relevant for forecasting. We reassess the conventional wisdom on market efficiency in light of this fact. In our model economy, which resembles a typical machine learning setting, N assets have cash flows that are a linear function of J firm characteristics, but with uncertain coefficients. Risk-neutral Bayesian investors impose shrinkage (ridge regression) or sparsity (Lasso) when they estimate the J coefficients of the model and use them to price assets. When J is comparable in size to N, returns appear cross-sectionally predictable using firm characteristics to an econometrician who analyzes data from the economy ex post. A factor zoo emerges even without p-hacking and data-mining. Standard in-sample tests of market efficiency reject the no-predictability null with high probability, despite the fact that investors optimally use the information available to them in real time. In contrast, out-of-sample tests retain their economic meaning.
    Keywords: Big Data; Machine Learning; Market Efficiency
    JEL: C11 C12 C58 G10 G12 G14
    Date: 2019–12
  6. By: Alekseeva, Liudmila; Azar, José; Gine, Mireia; Samila, Sampsa; Taska, Bledi
    Abstract: We document a dramatic increase in the demand for AI skills in online job postings over the period 2010-2019. The demand for AI skills is highest in IT occupations, followed by architecture/engineering, life/physical/social sciences, and management. The sectors with the highest demand for AI are information, professional services, and finance. At the firm level, higher demand for AI skills is associated in the cross-section with larger market capitalization, higher cash holdings, and higher investments in R\&D. We also document a large wage premium for job postings that require AI skills, as well as a wage premium for non-AI vacancies posted by firms with a high share of AI vacancies. Interestingly, managerial occupations have the highest wage premium for AI skills.
    Keywords: artificial intelligence; Machine Learning; technology diffusion; Wage Premium
    Date: 2020–01
  7. By: E. Ramos-P\'erez; P. J. Alonso-Gonz\'alez; J. J. N\'u\~nez-Vel\'azquez
    Abstract: An appropriate calibration and forecasting of volatility and market risk are some of the main challenges faced by companies that have to manage the uncertainty inherent to their investments or funding operations such as banks, pension funds or insurance companies. This has become even more evident after the 2007-2008 Financial Crisis, when the forecasting models assessing the market risk and volatility failed. Since then, a significant number of theoretical developments and methodologies have appeared to improve the accuracy of the volatility forecasts and market risk assessments. Following this line of thinking, this paper introduces a model based on using a set of Machine Learning techniques, such as Gradient Descent Boosting, Random Forest, Support Vector Machine and Artificial Neural Network, where those algorithms are stacked to predict S&P500 volatility. The results suggest that our construction outperforms other habitual models on the ability to forecast the level of volatility, leading to a more accurate assessment of the market risk.
    Date: 2020–06
  8. By: Daniel Bartl; Samuel Drapeau; Jan Obloj; Johannes Wiesel
    Abstract: We consider sensitivity of a generic stochastic optimization problem to model uncertainty. We take a non-parametric approach and capture model uncertainty using Wasserstein balls around the postulated model. We provide explicit formulae for the first order correction to both the value function and the optimizer and further extend our results to optimization under linear constraints. We present applications to statistics, machine learning, mathematical finance and uncertainty quantification. In particular, we provide explicit first-order approximation for square-root LASSO regression coefficients and deduce coefficient shrinkage compared to the ordinary least squares regression. We consider robustness of call option pricing and deduce a new Black-Scholes sensitivity, a non-parametric version of the so-called Vega. We also compute sensitivities of optimized certainty equivalents in finance and propose measures to quantify robustness of neural networks to adversarial examples.
    Date: 2020–06
  9. By: Susan Ariel Aaronson (George Washington University)
    Abstract: The world’s oceans are in trouble. Global warming is causing sea levels to rise and reducing the supply of food in the oceans. The ecological balance of the ocean has been disturbed by invasive species and cholera. Many pesticides and nutrients used in agriculture end up in the coastal waters, resulting in oxygen depletion that kills marine plants and shellfish. Meanwhile the supply of fish is declining due to overfishing. Yet to flourish, humankind requires healthy oceans; the oceans generate half of the oxygen we breathe, and, at any given moment, they contain more than 97% of the world’s water. Oceans provide at least a sixth of the animal protein people eat. Living oceans absorb carbon dioxide from the atmosphere and reduce climate change impacts. Many civil society groups (NGOs) are trying to protect this shared resource. As example, OceanMind uses satellite data and artificial intelligence (AI) to analyze the movements of vessels and compare their activities to historical patterns. The NGO can thus identify damaging behavior such as overfishing
    Keywords: data governance, AI, free trade, FTA, personal data, data protection
    JEL: F13 O3 O25 O38 O33
    Date: 2020–07
  10. By: Cyrille Lenoel; Garry Young
    Abstract: This paper presents the results of a survey that identifies real-time turning point indicators published by international statistical and economic institutions. It reports the evidence on past and present indicators used, the methodology underlying their construction and the way the indicators are presented. We find that business and consumer surveys are the most popular source of data and composite indicators like diffusion or first component are the most popular types of indicators. The use of novel databases, big data and machine learning has been limited so far but has a promising future.
    Keywords: business cycles, turning points, recession, leading indicator, composite indicator, diffusion index, bridge model, Markow-switching model
    JEL: C22 C25 C35 E32 E37
    Date: 2020–04
  11. By: Ali Hirsa; Weilong Fu
    Abstract: We investigate solving partial integro-differential equations (PIDEs) using unsupervised deep learning in this paper. To price options, assuming underlying processes follow \levy processes, we require to solve PIDEs. In supervised deep learning, pre-calculated labels are used to train neural networks to fit the solution of the PIDE. In an unsupervised deep learning, neural networks are employed as the solution, and the derivatives and the integrals in the PIDE are calculated based on the neural network. By matching the PIDE and its boundary conditions, the neural network gives an accurate solution of the PIDE. Once trained, it would be fast for calculating options values as well as option \texttt{Greeks}.
    Date: 2020–06
  12. By: Argentesi, Elena; Buccirossi, Paolo; Calvano, Emilio; Duso, Tomaso; Marrazzo, Alessia; Nava, Salvatore
    Abstract: This paper presents a broad retrospective evaluation of mergers and merger decisions in the digital sector. We first discuss the most crucial features of digital markets such as network effects, multi-sidedness, big data, and rapid innovation that create important challenges for competition policy. We show that these features have been key determinants of the theories of harm in major merger cases in the past few years. We then analyse the characteristics of almost 300 acquisitions carried out by three major digital companies Amazon, Facebook, and Google between 2008 and 2018. We cluster target companies on their area of economic activity and show that they span a wide range of economic sectors. In most cases, their products and services appear to be complementary to those supplied by the acquirers. Moreover, target companies seem to be particularly young, being four-years-old or younger in nearly 60% of cases at the time of the acquisition. Finally, we examine two important merger cases, Facebook/Instagram and Google/Waze, providing a systematic assessment of the theories of harm considered by the UK competition authorities as well as evidence on the evolution of the market after the transactions were approved. We discuss whether the CAs performed complete and careful analyses to foresee the competitive consequences of the investigated mergers and whether a more effective merger control regime can be achieved within the current legal framework.
    Keywords: Antitrust; Big Data; Digital Markets; Ex-post; mergers; network effects; platforms
    JEL: K21 L4
    Date: 2019–12
  13. By: Philippe Goulet Coulombe
    Abstract: Over the last decades, an impressive amount of non-linearities have been proposed to reconcile reduced-form macroeconomic models with the data. Many of them boil down to have linear regression coefficients evolving through time: threshold/switching/smooth-transition regression; structural breaks and random walk time-varying parameters. While all of these schemes are reasonably plausible in isolation, I argue that those are much more in agreement with the data if they are combined. To this end, I propose Macroeconomic Random Forests, which adapts the canonical Machine Learning (ML) algorithm to the problem of flexibly modeling evolving parameters in a linear macro equation. The approach exhibits clear forecasting gains over a wide range of alternatives and successfully predicts the drastic 2008 rise in unemployment. The obtained generalized time-varying parameters (GTVPs) are shown to behave differently compared to random walk coefficients by adapting nicely to the problem at hand, whether it is regime-switching behavior or long-run structural change. By dividing the typical ML interpretation burden into looking at each TVP separately, I find that the resulting forecasts are, in fact, quite interpretable. An application to the US Phillips curve reveals it is probably not flattening the way you think.
    Date: 2020–06
  14. By: Boolens, Joost; Cockx, Bart; Lechner, Michael
    Abstract: We investigate heterogenous employment effects of Flemish training programmes. Based on administrative individual data, we analyse programme effects at various aggregation levels using Modified Causal Forests (MCF), a causal machine learning estimator for multiple programmes. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and types of unemployed. Simulations show that assigning unemployed to programmes that maximise individual gains as identified in our estimation can considerably improve effectiveness. Simplified rules, such as one giving priority to unemployed with low employability, mostly recent migrants, lead to about half of the gains obtained by more sophisticated rules.
    Keywords: active labour market policy; Causal machine learning; conditional average treatment effects; modified causal forest; policy evaluation
    JEL: J68
    Date: 2020–01
  15. By: David Zenz (The Vienna Institute for International Economic Studies, wiiw)
    Abstract: Wir stellen ein Maß für die Beziehung zwischen zwei Städten/Regionen basierend auf Suchanfragen vor, ausgehend von Merkmalen der Suchanfragen-Zeitreihen nach Zerlegung der Zeitreihe mittels STL (Komponentenzerlegung mittels lokaler linearer Kernregression). Grundlage für das Maß sind einerseits die Eigenschaft 'Trendstärke', welches die Stärke des zugrundeliegenden Trends (egal ob steigend oder fallend) der Zeitreihe beschreibt, sowie das Feature 'linearity' der letzten fünf Jahre, welches uns die Richtung des Trends gibt. Die Kombination aus diesen Features der beiden Richtungen der Suchanfragen gibt uns ein Maß, welches für die Analyse der Entwicklung des vorgestellten Beziehungsmaßes über den Beobachtungszeitraum 2004-2020 in unterschiedlichen Suchkategorien zwischen zwei Städte/Regionen verwendet werden kann. Wir präsentieren Beispiele basierend auf Wien als point-of-interest im Kontext 'Wien und die Städte Europas', und schlagen ein Dashboard mit den verwendeten Indikatoren für Politik-Entscheidungen vor. Disclaimer Die Durchführung der Studie wurde durch finanzielle Unterstützung der Kulturabteilung der Stadt Wien (MA 7) ermöglicht. We introduce a measure of linkage for the relationship between cities/regions, based on time series features of search engine queries. The used features are backed by time series decomposition using STL, i.e. seasonal and trend decomposition using Loess, precisely the strength of the trend and the linearity of a time series. The combination of these two features for both sides of search interest, e.g. the search interest for a certain topic in the city of Berlin based on search queries posed in Vienna, allows for the analysis of the development of this computed measure of linkage for the period 2004-2020 in various search engine categories provided by Google Trends between cities/regions in Europe. We then present examples based on the city of Vienna as a point-of-interest for selected topics and propose a dashboard for policy decisions.
    Keywords: Zeitreihenanalyse, Big Data, Google Trends, Suchanfragen, Politik / Time Series Analysis, Big Data, Google Trends, Search Engine Queries, Policy
    JEL: C49 C80 C82 C87 C88 M30 R00 Z10 Z30
    Date: 2020–06
  16. By: Kwadwo Osei Bonsu; Jie Song
    Abstract: It is said that Data and Information are the new oil. One, who handles the data, handles the emerging future of the global economy. Complex algorithms and intelligence-based filter programs are utilized to manage, store, handle and maneuver vast amounts of data for the fulfillment of specific purposes. This paper seeks to find the bridge between artificial intelligence and its impact on the international policy implementation in the light of geopolitical influence, global economy and the future of labor markets. We hypothesize that the distortion in the labor markets caused by artificial intelligence can be mitigated by a collaborative international foreign policy on the deployment of AI in the industrial circles. We, in this paper, then proceed to propose a disposition for the essentials of AI-based foreign policy and implementation, while asking questions such as 'could AI become the real Invisible Hand discussed by economists?'.
    Date: 2020–06
  17. By: Feng, Gavin; Giglio, Stefano W; Xiu, Dacheng
    Abstract: We propose a model selection method to systematically evaluate the contribution to asset pricing of any new factor, above and beyond what a high-dimensional set of existing factors explains. Our methodology accounts for model selection mistakes that produce a bias due to omitted variables, unlike standard approaches that assume perfect variable selection. We apply our procedure to a set of factors recently discovered in the literature. While most of these new factors are shown to be redundant relative to the existing factors, a few have statistically significant explanatory power beyond the hundreds of factors proposed in the past.
    Keywords: Elastic Net; Factors; Lasso; Machine Learning; PCA; Post-Selection Inference; Regularized Two-Pass Estimation; Stochastic discount factor; variable selection
    Date: 2020–01
  18. By: Jungsik Hwang
    Abstract: Extracting previously unknown patterns and information in time series is central to many real-world applications. In this study, we introduce a novel approach to modeling financial time series using a deep learning model. We use a Long Short-Term Memory (LSTM) network equipped with the trainable initial hidden states. By learning to reconstruct time series, the proposed model can represent high-dimensional time series data with its parameters. An experiment with the Korean stock market data showed that the model was able to capture the relative similarity between a large number of stock prices in its latent space. Besides, the model was also able to predict the future stock trends from the latent space. The proposed method can help to identify relationships among many time series, and it could be applied to financial applications, such as optimizing the investment portfolios.
    Date: 2020–07
  19. By: Hassan, Tarek Alexander; Hollander, Stephan; Tahoun, Ahmed; van Lent, Laurence
    Abstract: Using tools from computational linguistics, we construct new measures of the impact of Brexit on listed firms in the United States and around the world; these measures are based on the proportion of discussions in quarterly earnings conference calls on the costs, benefits, and risks associated with the UK's intention to leave the EU. We identify which firms expect to gain or lose from Brexit and which are most affected by Brexit uncertainty. We then estimate effects of the different types of Brexit exposure on firm-level outcomes. We find that the impact of Brexit-related uncertainty extends far beyond British or even European firms; US and international firms most exposed to Brexit uncertainty lost a substantial fraction of their market value and have also reduced hiring and investment. In addition to Brexit uncertainty (the second moment), we find that international firms overwhelmingly expect negative direct effects from Brexit (the first moment) should it come to pass. Most prominently, firms expect difficulties from regulatory divergence, reduced labor mobility, limited trade access, and the costs of post-Brexit operational adjustments. Consistent with the predictions of canonical theory, this negative sentiment is recognized and priced in stock markets but has not yet significantly affected firm actions.
    Keywords: Brexit; cross-country effects; Machine Learning; sentiment; uncertainty
    JEL: D8 E22 E24 E32 E6 F0 G18 G32 G38 H32
    Date: 2019–12
  20. By: Jonathan Readshaw; Stefano Giani
    Abstract: This work presents a Convolutional Neural Network (CNN) for the prediction of next-day stock fluctuations using company-specific news headlines. Experiments to evaluate model performance using various configurations of word-embeddings and convolutional filter widths are reported. The total number of convolutional filters used is far fewer than is common, reducing the dimensionality of the task without loss of accuracy. Furthermore, multiple hidden layers with decreasing dimensionality are employed. A classification accuracy of 61.7\% is achieved using pre-learned embeddings, that are fine-tuned during training to represent the specific context of this task. Multiple filter widths are also implemented to detect different length phrases that are key for classification. Trading simulations are conducted using the presented classification results. Initial investments are more than tripled over a 838 day testing period using the optimal classification configuration and a simple trading strategy. Two novel methods are presented to reduce the risk of the trading simulations. Adjustment of the sigmoid class threshold and re-labelling headlines using multiple classes form the basis of these methods. A combination of these approaches is found to more than double the Average Trade Profit (ATP) achieved during baseline simulations.
    Date: 2020–06
  21. By: Jozef Barunik; Michael Ellington
    Abstract: We propose new measures to characterize dynamic network connections in large financial and economic systems. In doing so, our measures allow one to describe and understand causal network structures that evolve throughout time and over horizons using variance decomposition matrices from time-varying parameter VAR (TVP VAR) models. These methods allow researchers and practitioners to examine network connections over any horizon of interest whilst also being applicable to a wide range of economic and financial data. Our empirical application redefines the meaning of big in big data, in the context of TVP VAR models, and track dynamic connections among illiquidity ratios of all S\&P500 constituents. We then study the information content of these measures for the market return and real economy.
    Date: 2020–07
  22. By: Hansen, Karsten; Misra, Kanishka; Pai, Mallesh
    Abstract: Motivated by their increasing prevalence, we study outcomes when competing sellers use machine learning algorithms to run real-time dynamic price experiments. These algorithms are often misspecified, ignoring the effect of factors outside their control, e.g. competitors' prices. We show that the long-run prices depend on the informational value (or signal to noise ratio) of price experiments: if low, the long-run prices are consistent with the static Nash equilibrium of the corresponding full information setting. However, if high, the long-run prices are supra-competitive---the full information joint-monopoly outcome is possible. We show this occurs via a novel channel: competitors' algorithms' prices end up running correlated experiments. Therefore, sellers' misspecified models overestimate own price sensitivity, resulting in higher prices. We discuss the implications on competition policy.
    Keywords: algorithmic pricing; bandit algorithms; Collusion; Misspecified models
    Date: 2020–01
  23. By: Dainis Zegners; Uwe Sunde; Anthony Strittmatter
    Abstract: This paper presents novel evidence for the prevalence of deviations from rational behavior in human decision making – and for the corresponding causes and consequences. The analysis is based on move-by-move data from chess tournaments and an identification strategy that compares behavior of professional chess players to a rational behavioral benchmark that is constructed using modern chess engines. The evidence documents the existence of several distinct dimensions in which human players deviate from a rational benchmark. In particular, the results show deviations related to loss aversion, time pressure, fatigue, and cognitive limitations. The results also demonstrate that deviations do not necessarily lead to worse performance. Consistent with an important influence of intuition and experience, faster decisions are associated with more frequent deviations from the rational benchmark, yet they are also associated with better performance.
    Keywords: rational strategies, artificial intelligence, behavioural bias
    JEL: D01 D90 C70 C80
    Date: 2020
  24. By: Massimo Guidolin; Manuela Pedio
    Abstract: Using data on international, on-line media coverage and tone of the Brexit referendum, we test whether it is media coverage or tone to provide the largest forecasting performance improvements in the prediction of the conditional variance of weekly FTSE 100 stock returns. We find that versions of standard symmetric and asymmetric Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models augmented to include media coverage and especially media tone scores outperform traditional GARCH models both in- and out-of-sample.
    Keywords: Attention, Sentiment, Text Mining, Forecasting, Conditional Variance, GARCH model, Brexit
    JEL: C53 C58 G17
    Date: 2020
  25. By: Gorshkova, Taisiya (Горшкова, Таисия) (The Russian Presidential Academy of National Economy and Public Administration); Turuntseva, Marina (Турунцева, Марина) (The Russian Presidential Academy of National Economy and Public Administration)
    Abstract: The work is devoted to the analysis of existing theoretical models for forecasting regional macro-indicators and the study of the possibility of forecasting Russian data based on the selected theoretical approaches. A comparative analysis of theoretical approaches to modeling regional data is carried out. The approaches considered include diversification indices based on various economic theory, analysis of the possibility of using a composite welfare index as a proxy variable for the economic situation, and application of dynamic and non-linear models to regional data. The study was conducted on data on a set of macro indicators (CPI, GRP per capita, unemployment rate, average per capita income, etc.) in all regions of Russia, as well as for regions united by federal districts and by clusters determined on the basis of theoretical approaches. On the Russian data, various diversification indices were analyzed, and ensembles of neural networks and vector autoregressions were constructed, including taking into account the spatial dependence between the indicators.
    Date: 2020–03
  26. By: Serhan Cevik
    Abstract: The widespread availability of internet search data is a new source of high-frequency information that can potentially improve the precision of macroeconomic forecasting, especially in areas with data constraints. This paper investigates whether travel-related online search queries enhance accuracy in the forecasting of tourist arrivals to The Bahamas from the U.S. The results indicate that the forecast model incorporating internet search data provides additional information about tourist flows over a univariate approach using the traditional autoregressive integrated moving average (ARIMA) model and multivariate models with macroeconomic indicators. The Google Trends-augmented model improves predictability of tourist arrivals by about 30 percent compared to the benchmark ARIMA model and more than 20 percent compared to the model extended only with income and relative prices.
    Keywords: Real effective exchange rates;Economic growth;Economic forecasting;Real exchange rates;Personal income;Forecasting,tourist arrivals,Google Trends,time-series models,WP,ARIMA,tourist arrival,autoregressive,forecast model,time-series
    Date: 2020–01–31
  27. By: Badr Bentalha (USMBA - Université Sidi Mohamed Ben Abdellah)
    Abstract: The Big-Data describes the large volume of data used by economic actors. The data is analysed quickly to formulate instant analysis and data storage. This system is useful for several economic fields such as logistics and supply chain management (SCM). The latter is a management of physical and information flows, from customer to customer and from supplier to supplier, in order to offer a satisfactory response to customer needs. SCM was born and flourished in an industrial context. Nevertheless, several cur of Big-Data help improve the performance of supply chain management in service companies? To answer this question, we will define the concepts of SCM in services, focusing on the concept of Big-Data while analyzing the impact of Big-Data on the efficiency of SCM in service companies.
    Abstract: Le Big-Data décrit le grand volume de données utilisées par les acteurs de la vie économiques. Les données sont analysées rapidement de façon à formuler des analyses instantanées et un stockage de données. Ce système est utile po plusieurs domaines économiques comme la logistique et le supply chain management (SCM). Ce dernier est une gestion des flux physiques et d'informations, du client au client et du fournisseur au fournisseur, afin d'offrir une réponse satisfaisan aux besoins des clients. Le SCM a vu le jour et s'est épanoui dans un contexte industriel. Néanmoins, plusieurs recherches actuelles traitent le SCM dans le domaine de services. Ainsi, comment le recours au Big la performance du supply chain management des entreprises de services allons cerner les concepts de SCM dans les services, en nous focalisant sur les spécificités du management du Service Supply Chain Management (SSCM), et le concept de Big entreprises de services.
    Keywords: Service Supply Chain,SCM,Service Logistics,Big-Data,Supply chain,Digital Supply Chain,Entreprises de Services,Logistique de services
    Date: 2020
  28. By: Laurent Ferrara; Anna Simoni
    Abstract: We analyse whether, and when, a large set of Google search data can be useful to increase euro area GDP nowcasting accuracy once we control for information contained in official variables. To deal with these data we propose an estimator that combines variable pre-selection and Ridge regularization and study its theoretical properties. We show that in a period of cyclical stability Google data convey useful information for real-time nowcasting of GDP growth at the beginning of the quarter when macroeconomic information is lacking. On the other hand, in periods that exhibit a sudden downward shift in GDP growth rate, including Google search data in the information set improves nowcasting accuracy even when official macroeconomic information is available.
    Date: 2020–07
  29. By: Osório, António (António Miguel); Pinto, Alberto Adrego
    Abstract: In an avoidable harmful situation, autonomous vehicles systems are expected to choose the course of action that causes the less damage to everybody. However, this behavioral protocol implies some predictability. In this context, we show that if the autonomous vehicle decision process is perfectly known then malicious, opportunistic, terrorist, criminal and non-civic individuals may have incentives to manipulate it. Consequently, some levels of uncertainty are necessary for the system to be manipulation proof. Uncertainty removes the misbehavior incentives because it increases the risk and likelihood of unsuccessful manipulation. However, uncertainty may also decrease the quality of the decision process with negative impact in terms of efficiency and welfare for the society. We also discuss other possible solutions to this problem. Keywords: Artificial intelligence; Autonomous vehicles; Manipulation; Malicious Behavior; Uncertainty. JEL classification: D81, L62, O32.
    Keywords: Vehicles autònoms, 625 - Enginyeria del transport terrestre,
    Date: 2019
  30. By: Mels de Zeeuw
    Abstract: From 19th-century workers smashing textile factory machines to John Maynard Keynes's musing on technological unemployment, worries and passions about machines replacing workers are hundreds of years old. More recently, robots and computers (through artificial intelligence) are replacing a growing number of human skills, and this has become an important topic of conversation in public policy. It is also increasingly on the minds of workers and students making decisions about their investments in skills and career preparation.
    Keywords: opportunity occupations
    JEL: J21
    Date: 2020–02–12
  31. By: Braesemann, Fabian; Stephany, Fabian
    Abstract: The Internet, like railways and roads in the past, is paving innovation and alters the way in which citizens, consumers, businesses, and governments function and interact with each other. This digital revolution is empowering societies. It opens new, effective, and scalable services for governments and the private sector. It provides us with a more adaptive, data-driven approach to decision making in many aspects of our life. The digitalisation is particularly relevant for developing countries, as they can seize the opportunity for leapfrogging in order to become part of the global digitalised economy. With the example of Eastern Europe and Central Asia, this work illustrates how openly available online data can be used to identify, monitor, and visualise trends in digital economic development. Our interactive online dashboard allows researchers, policy-makers, and the public to explore four aspects of digital development: E-services, online labour markets, online knowledge creation and access to online knowledge.
    Date: 2020–06–27

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.