nep-big New Economics Papers
on Big Data
Issue of 2020‒07‒20
thirty-two papers chosen by
Tom Coupé
University of Canterbury

  1. Estimating investments in General Purpose Technologies. The case of AI Investments in Europe By Daniel Nepelski; Maciej Sobolewski
  2. Data and Competition: a General Framework with Applications to Mergers, Market Structure, and Privacy Policy By de Cornière, Alexandre; Taylor, Greg
  3. Dynamic pricing and revenues of Airbnb listings: estimating heterogeneous causal effects By Veronica Leoni; Jan Olof William Nilsson
  4. Seemingly Unrelated Regression with Measurement Error: Estimation via Markov chain Monte Carlo and Mean Field Variational Bayes Approximation By Georges Bresson; Anoop Chaturvedi; Mohammad Arshad Rahman; Shalabh
  5. AI Watch - Artificial Intelligence for the public sector: Report of the “1st Peer Learning Workshop on the use and impact of AI in public services†, Brussels 11-12 February 2020 By Colin van Noordt; Gianluca Misuraca; Marzia Mortati; Francesca Rizzo; Tjerk Timan
  6. Minimax Estimation of Conditional Moment Models By Nishanth Dikkala; Greg Lewis; Lester Mackey; Vasilis Syrgkanis
  7. Predicting cell phone adoption metrics using satellite imagery By Edward J. Oughton; Jatin Mathur
  8. Interest Rate Model with Investor Attitude and Text Mining (Published in IEEE Access) By Souta Nakatani; Kiyohiko G. Nishimura; Taiga Saito; Akihiko Takahashi
  9. Valid Causal Inference with (Some) Invalid Instruments By Jason Hartford; Victor Veitch; Dhanya Sridhar; Kevin Leyton-Brown
  10. Gender, Technology, and the Future of Work By Mariya Brussevich; Era Dabla-Norris; Christine Kamunge; Pooja Karnane; Salma Khalid; Kalpana Kochhar
  11. Text as data: a machine learning-based approach to measuring uncertainty By Rickard Nyman; Paul Ormerod
  12. Forecast Accuracy Matters for Hurricane Damages By Andrew B. Martinez
  13. It's Not Only Size That Matters: Determinants of Estonia's E-Governance Success By Stephany, Fabian
  14. Banking Supervision, Monetary Policy and Risk-Taking: Big Data Evidence from 15 Credit Registers By Altavilla, Carlo; Boucinha, Miguel; Peydró, José Luis; Smets, Frank
  15. Machine Learning Econometrics: Bayesian algorithms and methods By Dimitris Korobilis; Davide Pettenuzzo
  16. Belief Distortions and Macroeconomic Fluctuations By Francesco Bianchi; Sydney C. Ludvigson; Sai Ma
  17. An overall view of key problems in algorithmic trading and recent progress By Micha\"el Karpe
  18. Building(s and) cities: Delineating urban areas with a machine learning algorithm By Arribas-Bel, Daniel; Garcia-Lopez, Miquel-Angel; Viladecans-Marsal, Elisabet
  19. Optimal Asset Allocation For Outperforming A Stochastic Benchmark Target By Chendi Ni; Yuying Li; Peter Forsyth; Ray Carroll
  20. How Much Does Reducing Inequality Matter for Global Poverty? By Christoph Lakner; Daniel Gerszon Mahler; Mario Negre; Espen Beer Prydz
  21. Collusion detection on public procurement in Russia By Rey, Alexey (Рэй, Алексей); Shagarov, Dmitriy (Шагаров, Дмитрий); Andronova, Ekaterina (Андронова, Екатерина); Molchanova, Glafira (Молчанова, Глафира)
  22. Nowcasting Industrial Production Using Uncoventional Data Sources By Fornaro, Paolo
  23. A comparative study of forecasting Corporate Credit Ratings using Neural Networks, Support Vector Machines, and Decision Trees By Parisa Golbayani; Ionu\c{t} Florescu; Rupak Chatterjee
  24. Deep Learning modeling of Limit Order Book: a comparative perspective By Antonio Briola; Jeremy Turiel; Tomaso Aste
  25. Cognitive Performance in the Home Office - Evidence from Professional Chess By Künn, Steffen; Seel, Christian; Zegners, Dainis
  26. Determining Secondary Attributes for Credit Evaluation in P2P Lending By Revathi Bhuvaneswari; Antonio Segalini
  27. Deep Investing in Kyle's Single Period Model By Paul Friedrich; Josef Teichmann
  28. Reading between the lines - Using text analysis to estimate the loss function of the ECB By Paloviita, Maritta; Haavio, Markus; Jalasjoki, Pirkka; Kilponen, Juha; Vänni, Ilona
  29. Terrorist Attacks, Cultural Incidents and the Vote for Radical Parties: Analyzing Text from Twitter By Giavazzi, Francesco; Iglhaut, Felix; Lemoli, Giacomo; Rubera, Gaia
  30. Fear of the coronavirus and the stock markets By Lyócsa, Štefan; Baumöhl, Eduard; Výrost, Tomáš; Molnár, Peter
  31. How Firms will affect the Future of Work By Jacques Bughin
  32. Cities, Lights, and Skills in Developing Economies By Davis, Donald R; Dingel, Jonathan; Miscio, Antonio

  1. By: Daniel Nepelski (European Commission - JRC); Maciej Sobolewski (European Commission - JRC)
    Abstract: In spite of a large interest in General Purpose Technologies, it is unclear how much economies invest in their development and diffusion. For example, various sources provide various figures of investments in Artificial Intelligence (AI). This constantly blurs the understanding of the AI-driven revolution among policy makers and business leaders and constraints informed decision making. The current report presents an original and comprehensive methodology to estimate AI investments. It rests on three assumptions: First, it considers AI as a general-purpose technology (GPT). Second, it includes not only investments in the core AI technology, but in complementary assets and capabilities necessary for its adoption. Finally, the methodology recognises different roles that the public and private sectors play in the process of AI creation and implementation. Using this approach, AI investments in Europe are estimated.
    Keywords: General Purpose Technology, GPT, Artificial Intelligence, AI, digital technologies, investments, intangibles, Europe
    Date: 2020–05
  2. By: de Cornière, Alexandre; Taylor, Greg
    Abstract: What role does data play in competition? This question has been at the center of a fierce debate around competition policy in the digital economy. We use a competition-in-utilities approach to provide a general framework for studying the competitive effects of data, encompassing a wide range of markets where data has many different uses. We identify conditions for data to be unilaterally pro- or anti-competitive (UPC or UAC). The conditions are simple and often requires no information about market demand. We apply our framework to study various applications of data, including training algorithms, targeting advertisements, and personalizing prices. We also show that whether data is UPC or UAC has important implications for policy issues such as data-driven mergers, market structure, and privacy policy.
    Keywords: Big Data; Competition; data-driven mergers; privacy
    JEL: L1 L4 L5
    Date: 2020–02
  3. By: Veronica Leoni (Universitat de les Illes Balears); Jan Olof William Nilsson (Universitat de les Illes Balears)
    Abstract: This paper investigates the extent to which the implementation of intertemporal price discrimination affects Airbnb listings’ revenue. We found that on average, a price surge (i.e., increasing the price as we approach the date of service consumption) has an adverse effect on revenue. However, the magnitude of such effect exhibits significant heterogeneity among listings. Through the application of generalized random forests, a causal machine learning technique, we identify exacerbating and moderating treatment modifiers and shed light on the listing dimensions that cause price surges to be particularly detrimental for hosts’ revenues.
    Keywords: Airbnb; dynamic pricing; heterogeneous causal effects; generalized random forest.
    JEL: Z30 Z31 C21 C26
    Date: 2020
  4. By: Georges Bresson; Anoop Chaturvedi; Mohammad Arshad Rahman; Shalabh
    Abstract: Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates arising due to different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.
    Date: 2020–06
  5. By: Colin van Noordt (Tallinn Technology University); Gianluca Misuraca (European Commission - JRC); Marzia Mortati (Politecnico di Milano); Francesca Rizzo (Politecnico di Milano); Tjerk Timan (TNO)
    Abstract: This the report of the 1st AI WATCH Peer Learning Workshop on the Use of and Impact of AI in Public Services organized by JRC/B6 jointly with DG CNECT/H4. The workshop discussed the current state of AI in the public sector that shows how AI is widely experimented across European countries. From the analysis of results of the JRC activities on AI for the public sector conducted as part of the AI WATCH it emerged that these technologies are mostly applied in general public services, economic affairs and health services, with most Chatbots often mentioned. Most AI based innovation, however, seems to be mostly incremental or technical, with innovation truly causing disruptions in the public service model being limited. From the discussion in working groups and plenary it also emerged that activities of the AI Watch task on AI for the public sector should prioritize on the following Policy domains: Health, Education, Public Order, Housing, Transport and Agriculture. Finally, since an important part of the debate revolved around the topic of AI and data governance, it was decided to focus the 2nd AI WATCH Peer Learning Workshop with Member States on this domain.
    Keywords: AI, public sector, governance, data, EU
    Date: 2020–06
  6. By: Nishanth Dikkala; Greg Lewis; Lester Mackey; Vasilis Syrgkanis
    Abstract: We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression. We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game between a modeler who is optimizing over the hypothesis space of the target model and an adversary who identifies violating moments over a test function space. We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces, with respect to an appropriate analogue of the mean squared error metric, for ill-posed inverse problems. We show that when the minimax criterion is regularized with a second moment penalty on the test function and the test function space is sufficiently rich, then the estimation rate scales with the critical radius of the hypothesis and test function spaces, a quantity which typically gives tight fast rates. Our main result follows from a novel localized Rademacher analysis of statistical learning problems defined via minimax objectives. We provide applications of our main results for several hypothesis spaces used in practice such as: reproducing kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined via shape constraints, ensemble estimators such as random forests, and neural networks. For each of these applications we provide computationally efficient optimization methods for solving the corresponding minimax problem (e.g. stochastic first-order heuristics for neural networks). In several applications, we show how our modified mean squared error rate, combined with conditions that bound the ill-posedness of the inverse problem, lead to mean squared error rates. We conclude with an extensive experimental analysis of the proposed methods.
    Date: 2020–06
  7. By: Edward J. Oughton; Jatin Mathur
    Abstract: Approximately half of the global population does not have access to the internet, even though digital access can reduce poverty by revolutionizing economic development opportunities. Due to a lack of data, Mobile Network Operators (MNOs), governments and other digital ecosystem actors struggle to effectively determine if telecommunication investments are viable, especially in greenfield areas where demand is unknown. This leads to a lack of investment in network infrastructure, resulting in a phenomenon commonly referred to as the 'digital divide'. In this paper we present a method that uses publicly available satellite imagery to predict telecoms demand metrics, including cell phone adoption and spending on mobile services, and apply the method to Malawi and Ethiopia. A predictive machine learning approach can capture up to 40% of data variance, compared to existing approaches which only explain up to 20% of the data variance. The method is a starting point for developing more sophisticated predictive models of telecom infrastructure demand using publicly available satellite imagery and image recognition techniques. The evidence produced can help to better inform investment and policy decisions which aim to reduce the digital divide.
    Date: 2020–06
  8. By: Souta Nakatani (MTEC and Graduate School of Economics, University of Tokyo); Kiyohiko G. Nishimura (National Graduate Institute for Policy Studies (GRIPS) and CARF, University of Tokyo); Taiga Saito (Graduate School of Economics and CARF, University of Tokyo); Akihiko Takahashi (Graduate School of Economics and CARF, University of Tokyo)
    Abstract: This paper develops and estimates an interest rate model with investor attitude factors, which are extracted by a text mining method. First, we consider two contrastive attitudes (optimistic versus conservative) towards uncertainties about Brownian motions driving economy, develop an interest rate model, and obtain an empirical framework of the economy consisting of permanent and transitory factors. Second, we apply the framework to a bond market under extremely low interest rate environment in recent years, and show that our three-factor model with level, steepening and flattening factors based on different investor attitudes is capable of explaining the yield curve in the Japanese government bond (JGB) markets. Third, text mining of a large text base of daily financial news reports enables us to distinguish between steepening and flattening factors, and from these textual data we can identify events and economic conditions that are associated with the steepening and flattening factors. We then estimate the yield curve and three factors with frequencies of relevant word groups chosen from textual data in addition to observed interest rates. Finally, we show that the estimated three factors, extracted only from the bond market data, are able to explain the movement in stock markets, in particular Nikkei 225 index.
    Date: 2020–05
  9. By: Jason Hartford; Victor Veitch; Dhanya Sridhar; Kevin Leyton-Brown
    Abstract: Instrumental variable methods provide a powerful approach to estimating causal effects in the presence of unobserved confounding. But a key challenge when applying them is the reliance on untestable "exclusion" assumptions that rule out any relationship between the instrument variable and the response that is not mediated by the treatment. In this paper, we show how to perform consistent IV estimation despite violations of the exclusion assumption. In particular, we show that when one has multiple candidate instruments, only a majority of these candidates---or, more generally, the modal candidate-response relationship---needs to be valid to estimate the causal effect. Our approach uses an estimate of the modal prediction from an ensemble of instrumental variable estimators. The technique is simple to apply and is "black-box" in the sense that it may be used with any instrumental variable estimator as long as the treatment effect is identified for each valid instrument independently. As such, it is compatible with recent machine-learning based estimators that allow for the estimation of conditional average treatment effects (CATE) on complex, high dimensional data. Experimentally, we achieve accurate estimates of conditional average treatment effects using an ensemble of deep network-based estimators, including on a challenging simulated Mendelian Randomization problem.
    Date: 2020–06
  10. By: Mariya Brussevich; Era Dabla-Norris; Christine Kamunge; Pooja Karnane; Salma Khalid; Kalpana Kochhar
    Abstract: New technologies?digitalization, artificial intelligence, and machine learning?are changing the way work gets done at an unprecedented rate. Helping people adapt to a fast-changing world of work and ameliorating its deleterious impacts will be the defining challenge of our time. What are the gender implications of this changing nature of work? How vulnerable are women’s jobs to risk of displacement by technology? What policies are needed to ensure that technological change supports a closing, and not a widening, of gender gaps? This SDN finds that women, on average, perform more routine tasks than men across all sectors and occupations?tasks that are most prone to automation. Given the current state of technology, we estimate that 26 million female jobs in 30 countries (28 OECD member countries, Cyprus, and Singapore) are at a high risk of being displaced by technology (i.e., facing higher than 70 percent likelihood of being automated) within the next two decades. Female workers face a higher risk of automation compared to male workers (11 percent of the female workforce, relative to 9 percent of the male workforce), albeit with significant heterogeneity across sectors and countries. Less well-educated and older female workers (aged 40 and above), as well as those in low-skill clerical, service, and sales positions are disproportionately exposed to automation. Extrapolating our results, we find that around 180 million female jobs are at high risk of being displaced globally. Policies are needed to endow women with required skills; close gender gaps in leadership positions; bridge digital gender divide (as ongoing digital transformation could confer greater flexibility in work, benefiting women); ease transitions for older and low-skilled female workers.
    Keywords: Information technology;Technological innovation;Labor force participation;Gender equality;Gender;Automation, Technological Change, Jobs, Female Labor Force, Occupational Choice, Gender Equality
    Date: 2018–10–08
  11. By: Rickard Nyman; Paul Ormerod
    Abstract: The Economic Policy Uncertainty index had gained considerable traction with both academics and policy practitioners. Here, we analyse news feed data to construct a simple, general measure of uncertainty in the United States using a highly cited machine learning methodology. Over the period January 1996 through May 2020, we show that the series unequivocally Granger-causes the EPU and there is no Granger-causality in the reverse direction
    Date: 2020–06
  12. By: Andrew B. Martinez (Office of Macroeconomic Analysis, US Department of the Treasury)
    Abstract: I analyze damages from hurricane strikes on the United States since 1955. Using machine learning methods to select the most important drivers for damages, I show that large errors in a hurricane’s predicted landfall location result in higher damages. This relationship holds across a wide range of model specifications and when controlling for ex-ante uncertainty and potential endogeneity. Using a counterfactual exercise I find that the cumulative reduction in damages from forecast improvements since 1970 is about $82 billion, which exceeds the U.S. government’s spending on the forecasts and private willingness to pay for them.
    Keywords: Adaptation, Model Selection, Natural Disasters, Uncertainty
    JEL: C51 C52 Q51 Q54
    Date: 2020–05
  13. By: Stephany, Fabian
    Abstract: User data fuel the digital economy, while individual privacy is at stake. Governments react differently to this challenge. Estonia, a small Baltic state, has become a role model for the renewal of the social contract in times of big data. While e-governance usage has been growing in many parts of Europe during the last ten years, some regions are lagging behind. The Estonian example suggests that online governance is most accepted in a small state, with a young population, trustworthy institutions and the need of technological renewal. This work examines the development of e-governance usage (citizens interacting digitally with the government) during the last decade in Europe from a comprehensive cross-country perspective: Size, age and trust are relevant for the usage of digital government services in Europe. However, the quality of past communication infrastructure is not related to e-governance popularity.
    Date: 2020–05–12
  14. By: Altavilla, Carlo; Boucinha, Miguel; Peydró, José Luis; Smets, Frank
    Abstract: We analyse the effects of supranational versus national banking supervision on credit supply, and its interactions with monetary policy. For identification, we exploit: (i) a new, proprietary dataset based on 15 European credit registers; (ii) the institutional change leading to the centralisation of European banking supervision; (iii) high-frequency monetary policy surprises; (iv) differences across euro area countries, also vis-à-vis non-euro area countries. We show that supranational supervision reduces credit supply to firms with very high ex-ante and ex-post credit risk, while stimulating credit supply to firms without loan delinquencies. Moreover, the increased risk-sensitivity of credit supply driven by centralised supervision is stronger for banks operating in stressed countries. Exploiting heterogeneity across banks, we find that the mechanism driving the results is higher quantity and quality of human resources available to the supranational supervisor rather than changes in incentives due to the reallocation of supervisory responsibility to the new institution. Finally, there are crucial complementarities between supervision and monetary policy: centralised supervision offsets excessive bank risk-taking induced by a more accommodative monetary policy stance, but does not offset more productive risk-taking. Overall, we show that using multiple credit registers - first time in the literature - is crucial for external validity.
    Keywords: AnaCredit; Banking; euro area crisis; monetary policy; Supervision
    JEL: E51 E52 E58 G01 G21 G28
    Date: 2020–01
  15. By: Dimitris Korobilis (University of Glasgow); Davide Pettenuzzo (Brandeis University)
    Abstract: As the amount of economic and other data generated worldwide increases vastly, a challenge for future generations of econometricians will be to master efficient algorithms for inference in empirical models with large information sets. This Chapter provides a review of popular estimation algorithms for Bayesian inference in econometrics and surveys alternative algorithms developed in machine learning and computing science that allow for efficient computation in high-dimensional settings. The focus is on scalability and parallelizability of each algorithm, as well as their ability to be adopted in various empirical settings in economics and finance.
    Keywords: MCMC; approximate inference; scalability; parallel computation
    JEL: C11 C15 C49 C88
    Date: 2020–04
  16. By: Francesco Bianchi; Sydney C. Ludvigson; Sai Ma
    Abstract: This paper combines a data rich environment with a machine learning algorithm to provide estimates of time-varying systematic expectational errors (“belief distortions”) about the macroeconomy embedded in survey responses. We find that such distortions are large on average even for professional forecasters, with all respondent-types over-weighting their own forecast relative to other information. Forecasts of inflation and GDP growth oscillate between optimism and pessimism by quantitatively large amounts. To investigate the dynamic relation of belief distortions with the macroeconomy, we construct indexes of aggregate (across surveys and respondents) expectational biases in survey forecasts. Over-optimism is associated with an increase in aggregate economic activity. Our estimates provide a benchmark to evaluate theories for which information capacity constraints, extrapolation, sentiments, ambiguity aversion, and other departures from full information rational expectations play a role in business cycles.
    JEL: E03 E17
    Date: 2020–06
  17. By: Micha\"el Karpe
    Abstract: We summarize the fundamental issues at stake in algorithmic trading, and the progress made in this field over the last twenty years. We first present the key problems of algorithmic trading, describing the concepts of optimal execution, optimal placement, and price impact. We then discuss the most recent advances in algorithmic trading through the use of Machine Learning, discussing the use of Deep Learning, Reinforcement Learning, and Generative Adversarial Networks.
    Date: 2020–06
  18. By: Arribas-Bel, Daniel; Garcia-Lopez, Miquel-Angel; Viladecans-Marsal, Elisabet
    Abstract: This paper proposes a novel methodology for delineating urban areas based on a machine learning algorithm that groups build-ings within portions of space of suffi cient density. To do so, we use the precise geolocation of all 12 million buildings in Spain. We exploit building heights to create a new dimension for urban areas, namely, the vertical land, which provides a more accurate measure of their size. To better understand their internal structure and to illustrate an additional use for our algorithm, we also identify employment centers within the delineated urban areas. We test the robustness of our method and compare our urban areas to other delineations obtained using administrative borders and commuting-based patterns. We show that: 1) our urban areas are more similar to the commuting-based delineations than the administrative boundaries but that they are more precisely measured; 2) when analyzing the urban areas' size distribution, Zipf's law appears to hold for their population, surface and vertical land; and 3) the impact of transportation improvements on the size of the urban areas is not underestimated.
    Keywords: Buildings; City size; Machine Learning; Transportation; urban areas
    JEL: R12 R14 R2 R40
    Date: 2020–02
  19. By: Chendi Ni; Yuying Li; Peter Forsyth; Ray Carroll
    Abstract: We propose a data-driven Neural Network (NN) optimization framework to determine the optimal multi-period dynamic asset allocation strategy for outperforming a general stochastic target. We formulate the problem as an optimal stochastic control with an asymmetric, distribution shaping, objective function. The proposed framework is illustrated with the asset allocation problem in the accumulation phase of a defined contribution pension plan, with the goal of achieving a higher terminal wealth than a stochastic benchmark. We demonstrate that the data-driven approach is capable of learning an adaptive asset allocation strategy directly from historical market returns, without assuming any parametric model of the financial market dynamics. Following the optimal adaptive strategy, investors can make allocation decisions simply depending on the current state of the portfolio. The optimal adaptive strategy outperforms the benchmark constant proportion strategy, achieving a higher terminal wealth with a 90% probability, a 46% higher median terminal wealth, and a significantly more right-skewed terminal wealth distribution. We further demonstrate the robustness of the optimal adaptive strategy by testing the performance of the strategy on bootstrap resampled market data, which has different distributions compared to the training data.
    Date: 2020–06
  20. By: Christoph Lakner; Daniel Gerszon Mahler; Mario Negre; Espen Beer Prydz
    Abstract: The goals of ending extreme poverty by 2030 and working towards a more equal distribution of incomes are part of the United Nations’ Sustainable Development Goals. Using data from 166 countries comprising 97.5% of the world’s population, we simulate scenarios for global poverty from 2019 to 2030 under various assumptions about growth and inequality. We use different assumptions about growth incidence curves to model changes in inequality, and rely on a machine-learning algorithm called model-based recursive partitioning to model how growth in GDP is passed through to growth as observed in household surveys. When holding within-country inequality unchanged and letting GDP per capita grow according to World Bank forecasts and historically observed growth rates, our simulations suggest that the number of extreme poor (living on less than $1.90/day) will remain above 600 million in 2030, resulting in a global extreme poverty rate of 7.4%. If the Gini index in each country decreases by 1% per year, the global poverty rate could reduce to around 6.3% in 2030, equivalent to 89 million fewer people living in extreme poverty. Reducing each country’s Gini index by 1% per year has a larger impact on global poverty than increasing each country’s annual growth 1 percentage points above forecasts. We also study the impact of COVID-19 on poverty and find that the pandemic may have driven around 60 million people into extreme poverty in 2020. If the virus increased the Gini by 2% in all countries, then more than 90 million may have been driven into extreme poverty in 2020. Length: 33 pages
    Date: 2020–06
  21. By: Rey, Alexey (Рэй, Алексей) (The Russian Presidential Academy of National Economy and Public Administration); Shagarov, Dmitriy (Шагаров, Дмитрий) (The Russian Presidential Academy of National Economy and Public Administration); Andronova, Ekaterina (Андронова, Екатерина) (The Russian Presidential Academy of National Economy and Public Administration); Molchanova, Glafira (Молчанова, Глафира) (The Russian Presidential Academy of National Economy and Public Administration)
    Abstract: In public procurement in the markets, situations often arise when participants, instead of competing with each other for government contracts, distribute shares in the public procurement market and overestimate the price level above the competitive one. In this paper, it is proposed to use the methods of econometrics, mathematical statistics and machine learning to predict the likelihood that a purchase involving these counterparties on the supplier side and other candidates to win the competition will be the subject of a successful complaint to the FAS in terms of a cartel agreement between suppliers.
    Date: 2020–03
  22. By: Fornaro, Paolo
    Abstract: Abstract In this work, we rely on unconventional data sources to nowcast the year-on-year growth rate of Finnish industrial production, for different industries. As predictors, we use real-time truck traffic volumes measured automatically in different geographical locations around Finland, as well as electricity consumption data. In addition to standard time-series models, we look into the adoption of machine learning techniques to compute the predictions. We find that the use of non-typical data sources such as the volume of truck traffic is beneficial, in terms of predictive power, giving us substantial gains in nowcasting performance compared to an autoregressive model. Moreover, we find that the adoption of machine learning techniques improves substantially the accuracy of our predictions in comparison to standard linear models. While the average nowcasting errors we obtain are higher compared to the current revision errors of the official statistical institute, our nowcasts provide clear signals of the overall trend of the series and of sudden changes in growth.
    Keywords: Flash Estimates, Machine Learning, Big Data, Nowcasting
    JEL: C33 C55 E37
    Date: 2020–06–30
  23. By: Parisa Golbayani; Ionu\c{t} Florescu; Rupak Chatterjee
    Abstract: Credit ratings are one of the primary keys that reflect the level of riskiness and reliability of corporations to meet their financial obligations. Rating agencies tend to take extended periods of time to provide new ratings and update older ones. Therefore, credit scoring assessments using artificial intelligence has gained a lot of interest in recent years. Successful machine learning methods can provide rapid analysis of credit scores while updating older ones on a daily time scale. Related studies have shown that neural networks and support vector machines outperform other techniques by providing better prediction accuracy. The purpose of this paper is two fold. First, we provide a survey and a comparative analysis of results from literature applying machine learning techniques to predict credit rating. Second, we apply ourselves four machine learning techniques deemed useful from previous studies (Bagged Decision Trees, Random Forest, Support Vector Machine and Multilayer Perceptron) to the same datasets. We evaluate the results using a 10-fold cross validation technique. The results of the experiment for the datasets chosen show superior performance for decision tree based models. In addition to the conventional accuracy measure of classifiers, we introduce a measure of accuracy based on notches called "Notch Distance" to analyze the performance of the above classifiers in the specific context of credit rating. This measure tells us how far the predictions are from the true ratings. We further compare the performance of three major rating agencies, Standard $\&$ Poors, Moody's and Fitch where we show that the difference in their ratings is comparable with the decision tree prediction versus the actual rating on the test dataset.
    Date: 2020–07
  24. By: Antonio Briola; Jeremy Turiel; Tomaso Aste
    Abstract: The present work addresses theoretical and practical questions in the domain of Deep Learning for High Frequency Trading, with a thorough review and analysis of the literature and state-of-the-art models. Random models, Logistic Regressions, LSTMs, LSTMs equipped with an Attention mask, CNN-LSTMs and MLPs are compared on the same tasks, feature space, and dataset and clustered according to pairwise similarity and performance metrics. The underlying dimensions of the modeling techniques are hence investigated to understand whether these are intrinsic to the Limit Order Book's dynamics. It is possible to observe that the Multilayer Perceptron performs comparably to or better than state-of-the-art CNN-LSTM architectures indicating that dynamic spatial and temporal dimensions are a good approximation of the LOB's dynamics, but not necessarily the true underlying dimensions.
    Date: 2020–07
  25. By: Künn, Steffen (RS: GSBE Theme Learning and Work, Macro, International & Labour Economics); Seel, Christian (RS: GSBE Theme Conflict & Cooperation, Microeconomics & Public Economics); Zegners, Dainis
    Abstract: During the recent COVID-19 pandemic, traditional (offline) chess tournaments were prohibited and instead held online. We exploit this as a unique setting to assess the impact of moving offline tasks online on the cognitive performance of individuals. We use the Artificial Intelligence embodied in a powerful chess engine to assess the quality of chess moves and associated errors. Using within-player comparisons, we find a statistically and economically significant decrease in performance when competing online compared to competing offline. Our results suggest that teleworking might have adverse effects on workers performing cognitive tasks.
    JEL: H12 L23 M11 M54
    Date: 2020–07–14
  26. By: Revathi Bhuvaneswari; Antonio Segalini
    Abstract: There has been an increased need for secondary means of credit evaluation by both traditional banking organizations as well as peer-to-peer lending entities. This is especially important in the present technological era where sticking with strict primary credit histories doesn't help distinguish between a 'good' and a 'bad' borrower, and ends up hurting both the individual borrower as well as the investor as a whole. We utilized machine learning classification and clustering algorithms to accurately predict a borrower's creditworthiness while identifying specific secondary attributes that contribute to this score. While extensive research has been done in predicting when a loan would be fully paid, the area of feature selection for lending is relatively new. We achieved 65% F1 and 73% AUC on the LendingClub data while identifying key secondary attributes.
    Date: 2020–06
  27. By: Paul Friedrich; Josef Teichmann
    Abstract: The Kyle model describes how an equilibrium of order sizes and security prices naturally arises between a trader with insider information and the price providing market maker as they interact through a series of auctions. Ever since being introduced by Albert S. Kyle in 1985, the model has become important in the study of market microstructure models with asymmetric information. As it is well understood, it serves as an excellent opportunity to study how modern deep learning technology can be used to replicate and better understand equilibria that occur in certain market learning problems. We model the agents in Kyle's single period setting using deep neural networks. The networks are trained by interacting following the rules and objectives as defined by Kyle. We show how the right network architectures and training methods lead to the agents' behaviour converging to the theoretical equilibrium that is predicted by Kyle's model.
    Date: 2020–06
  28. By: Paloviita, Maritta; Haavio, Markus; Jalasjoki, Pirkka; Kilponen, Juha; Vänni, Ilona
    Abstract: We measure the tone (sentiment) of the ECB’s Governing Council regarding economic outlook at the time of each monetary policy meeting and use this information together with the Eurosystem/ECB staff macroeconomic projections to directly estimate the Governing Council’s loss function. Our results support earlier, more indirect findings, based on reaction function estimations, that the ECB has been either more averse to inflation above 2% ceiling or that the de facto inflation aim has been considerably below 2%. Our results suggest further that an inflation aim of 2% combined with asymmetry is a plausible specification of the ECB’s preferences.
    JEL: E31 E52 E58
    Date: 2020–07–06
  29. By: Giavazzi, Francesco; Iglhaut, Felix; Lemoli, Giacomo; Rubera, Gaia
    Abstract: We study the role of perceived threats from cultural diversity induced by terrorist attacks and a salient criminal event on public discourse and voters' support for far-right parties. We first develop a rule which allocates Twitter users in Germany to electoral districts and then use a machine learning method to compute measures of textual similarity between the tweets they produce and tweets by accounts of the main German parties. Using the dates of the aforementioned exogenous events we estimate constituency-level shifts in similarity to party language. We find that following these events Twitter text becomes on average more similar to that of the main far-right party, AfD, while the opposite happens for some of the other parties. Regressing estimated shifts in similarity on changes in vote shares between federal elections we find a significant association. Our results point to the role of perceived threats on the success of nationalist parties.
    Keywords: National elections; Political parties; social media; Terrorism; Text Analysis
    JEL: C45 D72 H56
    Date: 2020–02
  30. By: Lyócsa, Štefan; Baumöhl, Eduard; Výrost, Tomáš; Molnár, Peter
    Abstract: Since the outbreak of the COVID-19 pandemic, stock markets around the world have experienced unprecedented declines, which have resulted in extremely high stock market uncertainty, measured as price variation. In this paper, we show that during such periods, Google Trends data represent a timely and valuable data source for forecasting price variation. Fear of the coronavirus, as measured by Google searches is predictive of future stock market uncertainty for stock markets around the world. Google searches were also strongly correlated with the evolution of physical contagion (the number of new cases), and with implemented nonpharmaceutical interventions. The effect of pandemic-related policies on investors' attention and fear is thus very well captured by Google Trends data.
    Keywords: Coronavirus,Stock market,Uncertainty,Panic,Google Trends
    JEL: G01 G15
    Date: 2020
  31. By: Jacques Bughin
    Abstract: In the current debate over the Future of Work, there is little discussion about how firms anticipate the evolution of their demand for labor and the related mix of skills as they adopt Artificial Intelligence (AI) tools. This article contributes to this debate by leveraging a global survey of 3000 firms in 10 countries, covering the main sectors of the economy. Descriptive statistics from the survey are complemented by econometric analyses of corporate labor demand decisions. The findings are four-fold. First, those are still early days in the absorption of AI technologies, with less than 10% of companies investing in a majority of AI technologies and for multiple purposes. Second, if an aggregate portion of firms anticipates reducing employment as a result of adopting AI technologies, as many other companies anticipate labor growth or reorganizing employment. Third, this reallocation picture holds true when we examine further demand by labor functions and skills, with talent shifting toward more analytic, creative, and interaction skills, and away from administrative and routine-based functions, in line with past trends of skill- and routine-biased technological change. Fourth, a novel to the literature on Future of Work, econometric results on employment change highlight that employment dynamics are driven by related spillover effects to product markets. Higher competition, larger expectations of market (share) deployment may counterbalance negative automation effect on employment dynamics.
    Keywords: Artificial intelligence, Derived labor demand, Product market competition
    Date: 2020–06
  32. By: Davis, Donald R; Dingel, Jonathan; Miscio, Antonio
    Abstract: In developed economies, agglomeration is skill-biased: larger cities are skill-abundant and exhibit higher skilled wage premia. This paper characterizes the spatial distributions of skills in Brazil, China, and India. To facilitate comparisons with developed-economy findings, we construct metropolitan areas for each of these economies by aggregating finer geographic units on the basis of contiguous areas of light in nighttime satellite images. Our results validate this procedure. These lights-based metropolitan areas mirror commuting-based definitions in the United States and Brazil. In China and India, which lack commuting-based definitions, lights-based metropolitan populations follow a power law, while administrative units do not. Examining variation in relative quantities and prices of skill across these metropolitan areas, we conclude that agglomeration is also skill-biased in Brazil, China, and India.
    Keywords: cities; Metropolitan areas; satellite images; skill-biased agglomeration; Zipf's law
    JEL: C8 O1 O18 R1
    Date: 2020–02

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.