nep-big New Economics Papers
on Big Data
Issue of 2022‒02‒14
eighteen papers chosen by
Tom Coupé
University of Canterbury

  1. Modeling and Forecasting Intraday Market Returns: a Machine Learning Approach By Iuri H. Ferreira; Marcelo C. Medeiros
  2. Uncovering the Source of Machine Bias By Xiyang Hu; Yan Huang; Beibei Li; Tian Lu
  3. DeepSets and their derivative networks for solving symmetric PDEs * By Maximilien Germain; Mathieu Laurière; Huyên Pham; Xavier Warin
  4. Applications of Signature Methods to Market Anomaly Detection By Erdinc Akyildirim; Matteo Gambara; Josef Teichmann; Syang Zhou
  5. Optimal monetary policy using reinforcement learning By Hinterlang, Natascha; Tänzer, Alina
  6. Machine Learning for Labour Market Matching By Mühlbauer, Sabrina; Weber, Enzo
  7. Economic development, weather shocks and child marriage in South Asia: A machine learning approach By Dietrich, Stephan; Meysonnat, Aline; Rosales, Francisco; Cebotari, Victor; Gassmann, Franziska
  8. Policy Evaluation of Waste Pricing Programs Using Heterogeneous Causal Effect Estimation By Marica Valente
  9. Monetary policy, Twitter and financial markets: evidence from social media traffic By Donato Masciandaro; Davide Romelli; Gaia Rubera
  10. A Survey of Quantum Computing for Finance By Dylan Herman; Cody Googin; Xiaoyuan Liu; Alexey Galda; Ilya Safro; Yue Sun; Marco Pistoia; Yuri Alexeev
  11. Labour-saving automation and occupational exposure: A text-similarity measure By Montobbio, Fabio; Staccioli, Jacopo; Maria Enrica Virgillito; Vivarelli, Marco
  12. Automation and related technologies: A mapping of the new knowledge base By Santarelli, Enrico; Staccioli, Jacopo; Vivarelli, Marco
  13. Algorithm is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules By Narita, Yusuke; Yata, Kohei
  14. Climate Risks and Realized Volatility of Major Commodity Currency Exchange Rates By Matteo Bonato; Oguzhan Cepni; Rangan Gupta; Christian Pierdzioch
  15. The impact of research independence on PhD students' careers: Large-scale evidence from France By Patsali, Sofia; Pezzoni, Michele; Visentin, Fabiana
  16. Sellin' in the Rain: Weather, Climate, and Retail Sales By Brigitte Roth Tran
  17. Economic analysis using higher frequency time series: Challenges for seasonal adjustment By Ollech, Daniel
  18. Sitting next to a dropout: Study success of students with peers that came to the lecture hall by a different route By Daniel Goller; Andrea Diem; Stefan C. Wolter

  1. By: Iuri H. Ferreira; Marcelo C. Medeiros
    Abstract: In this paper we examine the relation between market returns and volatility measures through machine learning methods in a high-frequency environment. We implement a minute-by-minute rolling window intraday estimation method using two nonlinear models: Long-Short-Term Memory (LSTM) neural networks and Random Forests (RF). Our estimations show that the CBOE Volatility Index (VIX) is the strongest candidate predictor for intraday market returns in our analysis, specially when implemented through the LSTM model. This model also improves significantly the performance of the lagged market return as predictive variable. Finally, intraday RF estimation outputs indicate that there is no performance improvement with this method, and it may even worsen the results in some cases.
    Date: 2021–12
  2. By: Xiyang Hu; Yan Huang; Beibei Li; Tian Lu
    Abstract: We develop a structural econometric model to capture the decision dynamics of human evaluators on an online micro-lending platform, and estimate the model parameters using a real-world dataset. We find two types of biases in gender, preference-based bias and belief-based bias, are present in human evaluators' decisions. Both types of biases are in favor of female applicants. Through counterfactual simulations, we quantify the effect of gender bias on loan granting outcomes and the welfare of the company and the borrowers. Our results imply that both the existence of the preference-based bias and that of the belief-based bias reduce the company's profits. When the preference-based bias is removed, the company earns more profits. When the belief-based bias is removed, the company's profits also increase. Both increases result from raising the approval probability for borrowers, especially male borrowers, who eventually pay back loans. For borrowers, the elimination of either bias decreases the gender gap of the true positive rates in the credit risk evaluation. We also train machine learning algorithms on both the real-world data and the data from the counterfactual simulations. We compare the decisions made by those algorithms to see how evaluators' biases are inherited by the algorithms and reflected in machine-based decisions. We find that machine learning algorithms can mitigate both the preference-based bias and the belief-based bias.
    Date: 2022–01
  3. By: Maximilien Germain (EDF - EDF, LPSM (UMR_8001) - Laboratoire de Probabilités, Statistiques et Modélisations - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UP - Université de Paris, EDF R&D - EDF R&D - EDF - EDF, EDF R&D OSIRIS - Optimisation, Simulation, Risque et Statistiques pour les Marchés de l’Energie - EDF R&D - EDF R&D - EDF - EDF); Mathieu Laurière (ORFE - Department of Operations Research and Financial Engineering - Princeton University, School of Engineering and Applied Science); Huyên Pham (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistiques et Modélisations - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique, FiME Lab - Laboratoire de Finance des Marchés d'Energie - EDF R&D - EDF R&D - EDF - EDF - CREST - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres, CREST - Centre de Recherche en Économie et Statistique - ENSAI - Ecole Nationale de la Statistique et de l'Analyse de l'Information [Bruz] - X - École polytechnique - ENSAE Paris - École Nationale de la Statistique et de l'Administration Économique - CNRS - Centre National de la Recherche Scientifique); Xavier Warin (EDF - EDF, FiME Lab - Laboratoire de Finance des Marchés d'Energie - EDF R&D - EDF R&D - EDF - EDF - CREST - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres, EDF R&D - EDF R&D - EDF - EDF, EDF R&D OSIRIS - Optimisation, Simulation, Risque et Statistiques pour les Marchés de l’Energie - EDF R&D - EDF R&D - EDF - EDF)
    Abstract: Machine learning methods for solving nonlinear partial differential equations (PDEs) are hot topical issues, and different algorithms proposed in the literature show efficient numerical approximation in high dimension. In this paper, we introduce a class of PDEs that are invariant to permutations, and called symmetric PDEs. Such problems are widespread, ranging from cosmology to quantum mechanics, and option pricing/hedging in multi-asset market with exchangeable payoff. Our main application comes actually from the particles approximation of mean-field control problems. We design deep learning algorithms based on certain types of neural networks, named PointNet and DeepSet (and their associated derivative networks), for computing simultaneously an approximation of the solution and its gradient to symmetric PDEs. We illustrate the performance and accuracy of the PointNet/DeepSet networks compared to classical feedforward ones, and provide several numerical results of our algorithm for the examples of a mean-field systemic risk, mean-variance problem and a min/max linear quadratic McKean-Vlasov control problem.
    Keywords: Permutation-invariant PDEs,symmetric neural networks,exchangeability,deep backward scheme,mean-field control
    Date: 2022
  4. By: Erdinc Akyildirim; Matteo Gambara; Josef Teichmann; Syang Zhou
    Abstract: Anomaly detection is the process of identifying abnormal instances or events in data sets which deviate from the norm significantly. In this study, we propose a signatures based machine learning algorithm to detect rare or unexpected items in a given data set of time series type. We present applications of signature or randomized signature as feature extractors for anomaly detection algorithms; additionally we provide an easy, representation theoretic justification for the construction of randomized signatures. Our first application is based on synthetic data and aims at distinguishing between real and fake trajectories of stock prices, which are indistinguishable by visual inspection. We also show a real life application by using transaction data from the cryptocurrency market. In this case, we are able to identify pump and dump attempts organized on social networks with F1 scores up to 88% by means of our unsupervised learning algorithm, thus achieving results that are close to the state-of-the-art in the field based on supervised learning.
    Date: 2022–01
  5. By: Hinterlang, Natascha; Tänzer, Alina
    Abstract: This paper introduces a reinforcement learning based approach to compute optimal interest rate reaction functions in terms of fulfilling inflation and output gap targets. The method is generally flexible enough to incorporate restrictions like the zero lower bound, nonlinear economy structures or asymmetric preferences. We use quarterly U.S. data from1987:Q3-2007:Q2 to estimate (nonlinear) model transition equations, train optimal policies and perform counterfactual analyses to evaluate them, assuming that the transition equations remain unchanged. All of our resulting policy rules outperform other common rules as well as the actual federal funds rate. Given a neural network representation of the economy, our optimized nonlinear policy rules reduce the central bank's loss by over43 %. A DSGE model comparison exercise further indicates robustness of the optimized rules.
    Keywords: Optimal Monetary Policy,Reinforcement Learning,Artificial Neural Network,Machine Learning,Reaction Function
    JEL: C45 C61 E52 E58
    Date: 2021
  6. By: Mühlbauer, Sabrina (Institute for Employment Research (IAB), Nuremberg, Germany); Weber, Enzo (Institute for Employment Research (IAB), Nuremberg, Germany)
    Abstract: "This paper develops a large-scale application to improve the labour market matching process with model- and algorithm-based statistical methods. We use comprehensive administrative data on employment biographies covering individual and job-related information of workers in Germany. We estimate the probability that a job seeker gets employed in a certain occupational field. For this purpose, we make predictions with common statistical methods and machine learning (ML) methods. The findings suggest that ML performs better than the other methods regarding the out-of-sample classification error. In terms of the unemployment rate, the advantage of ML would stand for a difference of 2.9 - 3.6 percentage points." (Author's abstract, IAB-Doku) ((en))
    Keywords: IAB-Open-Access-Publikation
    JEL: C14 C45 J64 C55
    Date: 2022–02–02
  7. By: Dietrich, Stephan (UNU-MERIT, Maastricht University); Meysonnat, Aline (University of Washington, Daniel J. Evans School of Public Policy and Governance); Rosales, Francisco (ESAN Graduate School of Business, Lima); Cebotari, Victor (University of Luxembourg); Gassmann, Franziska (UNU-MERIT, Maastricht University)
    Abstract: Globally, 21 percent of young women are married before their 18th birthday. Despite some progress in addressing child marriage, it remains a widespread practice, in particular in South Asia. While household predictors of child marriage have been studied extensively in the literature, the evidence base on macro-economic factors contributing to child marriage and models that predict where child marriage cases are most likely to occur remains limited. In this paper we aim to fill this gap and explore region-level indicators to predict the persistence of child marriage in four countries in South Asia, namely Bangladesh, India, Nepal and Pakistan. We apply machine learning techniques to child marriage data and develop a prediction model that relies largely on regional and local inputs such as droughts, floods, population growth and nightlight data to model the incidence of child marriages. We find that our gradient boosting model is able to identify a large proportion of the true child marriage cases and correctly classifies 78% of the true marriage cases, with a higher accuracy in Bangladesh (90% of the cases) and a lower accuracy in Nepal (71% of cases). In addition, all countries contain in their top 10 variables for classification nighttime light growth, a shock index of drought over the previous and the last two years and the regional level of education, suggesting that income shocks, the regional economic activity and regional education levels play a significant role in predicting child marriage. Given the accuracy of the model to predict child marriage, our model is a valuable tool to support policy design in countries where household-level data remains limited.
    Keywords: child marriage, income shocks, machine learning, South Asia
    JEL: J1 J12 O15 Q54 R11
    Date: 2021–09–10
  8. By: Marica Valente
    Abstract: Using machine learning methods in a quasi-experimental setting, I study the heterogeneous effects of introducing waste prices - unit prices on household unsorted waste disposal - on waste demands and social welfare. First, using a unique panel of Italian municipalities with large variation in prices and observables, I show that waste demands are nonlinear. - find evidence of constant elasticities at low prices, and increasing elasticities at high prices driven by income effects and waste habits before policy. Second, I estimate policy impacts on pollution and municipal management costs, and compute the overall social cost savings for each municipality. Social welfare effects are positive for all municipalities after three years of adoption, when waste prices cause significant waste avoidance.
    Keywords: Waste pricing, causal effect heterogeneity, machine learning, welfare
    JEL: C14 C21 C52 Q53
    Date: 2021
  9. By: Donato Masciandaro; Davide Romelli; Gaia Rubera
    Abstract: How does central bank communication affect financial markets? This paper shows that the monetary policy announcements of three major central banks, i.e. the European Central Bank, the Federal Reserve and the Bank of England, trigger significant discussions on monetary policy on Twitter. Using machine learning techniques we identify Twitter messages related to monetary policy around the release of monetary policy decisions and we build a metric of the similarity between the policy announcement and Twitter traffic before and after the announcement. We interpret large changes in the similarity of tweets and announcements as a proxy for monetary policy surprise and show that market volatility spikes after the announcement whenever changes in similarity are high. These findings suggest that social media discussions on central bank communication are aligned with bond and stock market reactions.
    Keywords: monetary policy, central bank communication, financial markets, social media, Twitter, Federal Reserve, European Central Bank, Bank of England
    JEL: E44 E52 E58 G14 G15 G41
    Date: 2021
  10. By: Dylan Herman; Cody Googin; Xiaoyuan Liu; Alexey Galda; Ilya Safro; Yue Sun; Marco Pistoia; Yuri Alexeev
    Abstract: Quantum computers are expected to surpass the computational capabilities of classical computers during this decade and have transformative impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from quantum computing, not only in the medium and long terms, but even in the short term. This survey paper presents a comprehensive summary of the state of the art of quantum computing for financial applications, with particular emphasis on Monte Carlo integration, optimization, and machine learning, showing how these solutions, adapted to work on a quantum computer, can help solve more efficiently and accurately problems such as derivative pricing, risk analysis, portfolio optimization, natural language processing, and fraud detection. We also discuss the feasibility of these algorithms on near-term quantum computers with various hardware implementations and demonstrate how they relate to a wide range of use cases in finance. We hope this article will not only serve as a reference for academic researchers and industry practitioners but also inspire new ideas for future research.
    Date: 2022–01
  11. By: Montobbio, Fabio (Università Cattolica del Sacro Cuore, BRICK, Collegio Carlo Alberto, and ICRIOS, Bocconi University); Staccioli, Jacopo (Università Cattolica del Sacro Cuore, and Institute of Economics, Scuola Superiore Sant’Anna); Maria Enrica Virgillito (Institute of Economics, Scuola Superiore Sant’Anna, and Università Cattolica del Sacro Cuore); Vivarelli, Marco (UNU-MERIT, Maastricht University, IZA, and Università Cattolica del Sacro Cuore)
    Abstract: This paper represents one of the first attempts at building a direct measure of occupational exposure to robotic labour-saving technologies. After identifying robotic and LS robotic patents retrieved by Montobbio et al. (2022), the underlying 4-digit CPC definitions are employed in order to detect functions and operations performed by technological artefacts which are more directed to substitute the labour input. This measure allows to obtain fine-grained information on tasks and occupations according to their similarity ranking. Occupational exposure by wage and employment dynamics in the United States is then studied, complemented by investigating industry and geographical penetration rates.
    Keywords: Labour-Saving Technology, Natural Language Processes, Labour Markets, Technological Unemployment
    JEL: O33 J24
    Date: 2021–11–25
  12. By: Santarelli, Enrico (Department of Economics, University of Bologna, and Department of Economics and Management, University of Luxembourg); Staccioli, Jacopo (Department of Economic Policy, Catholic University of the Sacred Heart, and Institute of Economics, Sant’Anna School of Advanced Studies); Vivarelli, Marco (UNU-MERIT, Maastricht University, and Department of Economic Policy, Catholic University of the Sacred Heart, and Forschungsinstitut zur Zukunft der Arbeit GmbH (IZA))
    Abstract: Using the entire population of USPTO patent applications published between 2002 and 2019, and leveraging on both patent classification and semantic analysis, this paper aims to map the current knowledge base centred on robotics and AI technologies. These technologies are investigated both as a whole and distinguishing core and related innovations, along a 4-level core-periphery architecture. Merging patent applications with the Orbis IP firm-level database allows us to put forward a twofold analysis based on industry of activity and geographic location. In a nutshell, results show that: (i) rather than representing a technological revolution, the new knowledge base is strictly linked to the previous technological paradigm; (ii) the new knowledge base is characterised by a considerable – but not impressively widespread – degree of pervasiveness; (iii) robotics and AI are strictly related, converging (particularly among the related technologies and in more recent times) and jointly shaping a new knowledge base that should be considered as a whole, rather than consisting of two separate GPTs; (iv) the US technological leadership turns out to be confirmed (although declining in relative terms in favour of Asian countries such as South Korea, China and, more recently, India).
    Keywords: Robotics, Artificial Intelligence, General Purpose Technology, Technological Paradigm, Industry
    JEL: O25 O31 O33 O34
    Date: 2022–01–17
  13. By: Narita, Yusuke; Yata, Kohei
    Abstract: Algorithms produce a growing portion of decisions and recommendations both in policy and business. Such algorithmic decisions are natural experiments (conditionally quasirandomly assigned instruments) since the algorithms make decisions based only on observable input variables. We use this observation to develop a treatment-effect estimator for a class of stochastic and deterministic decision-making algorithms. Our estimator is shown to be consistent and asymptotically normal for well-defined causal effects. A key special case of our estimator is a multidimensional regression discontinuity design. We apply our estimator to evaluate the effect of the Coronavirus Aid, Relief, and Economic Security (CARES) Act, where hundreds of billions of dollars worth of relief funding is allocated to hospitals via an algorithmic rule. Our estimates suggest that the relief funding has little effect on COVID- 19-related hospital activity levels. Naive OLS and IV estimates exhibit substantial selection bias.
    Date: 2022–01
  14. By: Matteo Bonato (Department of Economics and Econometrics, University of Johannesburg, Auckland Park, South Africa; IPAG Business School, 184 Boulevard Saint-Germain, 75006 Paris, France); Oguzhan Cepni (Copenhagen Business School, Department of Economics, Porcelaenshaven 16A, Frederiksberg DK-2000, Denmark); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa); Christian Pierdzioch (Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008 Hamburg, Germany)
    Abstract: We find that climate-related risks forecast the intraday-data-based realized volatility of exchange-rate returns of eight major fossil fuel-exporters (Australia, Brazil, Canada, Malaysia, Mexico, Norway, Russia, and South Africa). We study a wide array of metrics capturing risks associated with climate change, derived from data directly on variables such as, for example, abnormal patterns of temperature. We control for various other moments (realized skewness, realized kurtosis, realized good and variance, upside and downside tail risk, and jumps) and estimate our forecasting models using random forests, a machine-learning technique tailored to analyze models with many predictors.
    Keywords: Climate Risks, Commodity Currencies, Realized Variance, Forecasting
    JEL: C22 C53 F31 Q54
    Date: 2022–02
  15. By: Patsali, Sofia (Université Côte d'Azur, GREDEG, and Université de Strasbourg, BETA, CNRS France); Pezzoni, Michele (Université Côte d'Azur, GREDEG, CNRS, Observatoire des Sciences et Techniques, HCERES, OFCE, Sciences Po, and ICRIOS, Bocconi University, Italy); Visentin, Fabiana (UNU-MERIT, Maastricht University)
    Abstract: This study investigates the effect of research independence during the PhD period on students' career outcomes. We use a unique and detailed dataset on the French population of STEM PhD students who graduated between 1995 and 2013. To measure research independence, we compare the PhD thesis content with the supervisor's research. We employ advanced neural network text analysis techniques evaluating the similarity between the student's thesis abstract and supervisor's publications during the PhD period. After exploring which characteristics of the PhD training experience and supervisor explain the level of research similarity, we estimate how similarity associates with the likelihood of pursuing a research career. We find that the student thesis's similarity with her supervisor's research work is negatively associated with starting a career in academia and patenting probability. Increasing the PhD-supervisor similarity score by one standard deviation is associated with a 2.1 percentage point decrease in the probability of obtaining an academic position and a 0.57 percentage point decrease in the probability of patenting. However, conditional on starting an academic career, PhD-supervisor similarity is associated with a higher student's productivity after graduation as measured by citations received, network size, and probability of moving to a foreign or US-based affiliation.
    Keywords: Research independence, Early career researchers, Scientific career outcomes, Neural network text analysis
    JEL: D22 O30 O33 O38
    Date: 2021–10–15
  16. By: Brigitte Roth Tran
    Abstract: I apply a novel machine-learning based “weather index” method to daily store- level sales data for a national apparel and sporting goods brand to examine short-run responses to weather and long-run adaptation to climate. I find that even when considering potentially offsetting shifts of sales between outdoor and indoor stores, to the firm's website, or over time, weather has significant persistent effects on sales. This suggests that weather may increase sales volatility as more severe weather shocks be- come more frequent under climate change. Consistent with adaptation to climate, I find that sensitivity of sales to weather decreases with historical experience for precipitation, snow, and cold weather events, but-surprisingly-not for extreme heat events. This suggests that adaptation may moderate some but not all of the adverse impacts of climate change on sales. Retailers can respond by adjusting their staffing, inventory, promotion events, compensation, and financial reporting.
    Keywords: adaptation; climate change; weather; machine learning; retail; sales
    JEL: Q54 L81 D12
    Date: 2022–01–21
  17. By: Ollech, Daniel
    Abstract: The COVID-19 pandemic has increased the need for timely and granular information to assess the state of the economy in real time. Weekly and daily indices have been constructed using higher frequency data to address this need. Yet the seasonal and calendar adjustment of the underlying time series is challenging. Here, we analyse the features and idiosyncracies of such time series relevant in the context of seasonal adjustment. Drawing on a set of time series for Germany - namely hourly electricity consumption, the daily truck toll mileage, and weekly Google Trends data - used in many countries to assess economic development during the pandemic, we discuss obstacles, difficulties, and adjustment options. Furthermore, we develop a taxonomy of the central features of seasonal higher frequency time series.
    Keywords: COVID-19,DSA,Calendar adjustment,Time series characteristics
    JEL: C14 C22 C87 E66
    Date: 2021
  18. By: Daniel Goller; Andrea Diem; Stefan C. Wolter
    Abstract: Higher education brings together students from diverse educational backgrounds, including students, who after dropping out of a first course of study, transferred to an academically less demanding institution. While peers are important contributors to student success, the influence of those dropouts with a knowledge advantage on first-time students is largely unexplored. Using an administrative data set covering every individual in the Swiss higher education system, we study the impact of the presence of academically better prepared students on the study success of first-time students. Our identification strategy relies on conditional idiosyncratic variations in the proportion of returning dropouts in university of applied sciences cohorts. We find negative effects of university dropouts who re-enroll in the same subject on the success of first-time students. In contrast, dropouts who change subjects are positively associated to the success of their new peers. Using causal machine learning methods, we find that the effects (a) are non-linear and (b) vary for different proportions of dropouts in university of applied sciences cohorts.
    Keywords: University dropouts, peer effects, better prepared students, causal machine learning
    JEL: A23 C14 I23
    Date: 2022–01

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.