nep-big New Economics Papers
on Big Data
Issue of 2021‒11‒15
twenty-six papers chosen by
Tom Coupé
University of Canterbury

  1. The Local Economic Impact of Mineral Mining in Africa: Evidence from Four Decades of Satellite Imagery By Sandro Provenzano; Hannah Bull
  2. Central Bank Tone and the Dispersion of Views within Monetary Policy Committees By Paul Hubert; Fabien Labondance
  3. Speaking the same language: A machine learning approach to classify skills in Burning Glass Technologies data By Julie Lassébie; Luca Marcolin; Marieke Vandeweyer; Benjamin Vignal
  4. Machine learning in explaining nonprofit organizations’ participation : a driving factors analysis approach By Zhanxue Gong; Xiyuan Li; Jiawen Liu; Yeming Gong
  6. Multidimensionality of Land Ownership Among Men and Women in Sub-Saharan Africa: A Machine Learning Clustering Exercise By Kilic, Talip; Hasanbasri, Ardina; Moylan, Heather; Koolwal, Gayatri
  7. Quantification of Economic Uncertainty: a deep learning approach By Gillmann, Niels; Kim, Alisa
  8. Data-driven Hedging of Stock Index Options via Deep Learning By Jie Chen; Lingfei Li
  9. A Meta-Method for Portfolio Management Using Machine Learning for Adaptive Strategy Selection By Damian Kisiel; Denise Gorse
  10. Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning By John M. Abowd; Joelle Abramowitz; Margaret C. Levenstein; Kristin McCue; Dhiren Patki; Trivellore Raghunathan; Ann M. Rodgers; Matthew D. Shapiro; Nada Wasi; Dawn Zinsser
  11. AI Watch. Defining Artificial Intelligence 2.0. Towards an operational definition and taxonomy of AI for the AI landscape By Sofia Samoili; Montserrat Lopez Cobo; Blagoj Delipetrev; Fernando Martinez-Plumed; Emilia Gomez; Giuditta De Prato
  12. EU in the global Artificial Intelligence landscape By RIGHI Riccardo; LOPEZ COBO Montserrat; SAMOILI Sofia; CARDONA Melisande; VAZQUEZ-PRADA BAILLET Miguel; DE PRATO Giuditta
  13. Intelligence artificielle, croissance et emploi : le rôle des politiques By Philippe Aghion; Céline Antonin; Simon Bunel
  14. Understanding corporate default using Random Forest: The role of accounting and market information By Alessandro Bitetto; Stefano Filomeni; Michele Modina
  15. Boosting Tax Revenues with Mixed-Frequency Data in the Aftermath of Covid-19: The Case of New York By Kajal Lahiri; Cheng Yang
  17. Global Evidence on Misperceptions and Preferences for Redistribution By Jennifer Elena Feichtmayer; Klaus Gründler
  18. Deep Learning Algorithms for Hedging with Frictions By Xiaofei Shi; Daran Xu; Zhanhao Zhang
  19. "Deep Asymptotic Expansion with Weak Approximation" By Yuga Iguchi; Riu Naito; Yusuke Okano; Akihiko Takahashi; Toshihiro Yamada
  20. Deep Asymptotic Expansion: Application to Financial Mathematics(forthcoming in proceedings of IEEE CSDE 2021) By Yuga Iguchi; Riu Naito; Yusuke Okano; Akihiko Takahashi; Toshihiro Yamada
  21. Quantifying Political Populism and Examining the Link with Economic Insecurity: evidence from Greece By Raphael Ntentas
  22. Promise not Fulfilled: FinTech Data Privacy, and the GDPR By Gregor Dorfleitner; Lars Hornuf; Julia Kreppmeier
  23. Digital Platform Markets of ASEAN and India: Implications for Cooperation with Korea By Kim, Jeong Gon; Na, Seung Kwon; Lee, Jaeho; Yun, ChiHyun; Kim, Eunmi
  24. occupation2vec: A general approach to representing occupations By Nicolaj S{\o}ndergaard M\"uhlbach
  25. Using Sparse Modeling to Detect Accounting Fraud (Japanese) By USUKI Teppei; KONDO Satoshi; SHIRAKI Kengo; MASADA Takahiro; SUZAKI Kosuke; MIYAKAWA Daisuke
  26. Mark my words: the transmission of central bank communication to the general public via the print media By Munday, Tim; Brookes, James

  1. By: Sandro Provenzano; Hannah Bull
    Abstract: This paper assembles large archives of satellite imagery to provide novel insights on how mine openings and closings impact the development of local communities in Africa. We collect 30m-resolution Landsat images between 1984 and 2019 from a 40km radius around 1,658 mineral deposits, covering 12% of the African landmass. Using state-of-the-art techniques in computer vision, we translate these images into economically meaningful indicators, including material wealth predictions as well as urban and agricultural land use. We then use stacked event studies and difference-in-difference models to estimate the local impact of mine openings and closings on these indicators. Our findings demonstrate that mine openings increase wealth and boost local urban growth and agricultural activities in the surrounding area. Furthermore, democratic institutions are a decisive factor for making mining a success for local communities. However, our results show that the fast growth in mining areas is only temporary. After the mines close, former mining areas cannot maintain elevated growth rates and revert to the same pace of development as areas without mines.
    Date: 2021–11
  2. By: Paul Hubert (OFCE - Observatoire français des conjonctures économiques - Sciences Po - Sciences Po); Fabien Labondance (OFCE - Observatoire français des conjonctures économiques - Sciences Po - Sciences Po)
    Abstract: Does policymakers' choice of words matter? We explore empirically whether central bank tone conveyed in FOMC statements contains useful information for financial market participants. We quantify central bank tone using computational linguistics and identify exogenous shocks to central bank tone orthogonal to the state of the economy. Using an ARCH model and a high-frequency approach, we find that positive central bank tone increases interest rates at the 1-year maturity. We therefore investigate which potential pieces of information could be revealed by central bank tone. Our tests suggest that it relates to the dispersion of views among FOMC members. This information may be useful to financial markets to understand current and future policy decisions. Finally, we show that central bank tone helps predicting future policy decisions.
    Keywords: Optimism,FOMC,Dissent,Interest rate expectations,ECB
    Date: 2020–01–01
  3. By: Julie Lassébie; Luca Marcolin; Marieke Vandeweyer; Benjamin Vignal
    Abstract: This report presents a methodology to classify skill requirements in online job postings into a pre-existing expert-driven taxonomy of broader skill categories. The proposed approach uses a semi-supervised Machine Learning algorithm and relies on the actual meaning and definition of the skills. It allows for the classification of more than 17 000 unique skill keywords contained in the Burning Glass dataset into 61 categories. The outcome of the classification exercise is validated using O*NET information on skills by occupations, and by benchmarking the results of some empirical descriptive exercises against the existing literature. Compared to a manual classification, the proposed approach organises large amounts of skills information in an analytically tractable form, and with considerable savings in time and human resources.
    JEL: C45 C55 J23 J24 J63
    Date: 2021–11–11
  4. By: Zhanxue Gong (emlyon business school); Xiyuan Li; Jiawen Liu; Yeming Gong
    Abstract: The construction of smart cities requires the participation of nonprofit organizations, but there are still some problems in the analysis of driving factors of participation. Based on this, using the structural equation model as the research method, a public satisfaction relationship model, based on the machine learning, for nonprofit organizations participating in the construction planning of smart cities was constructed in this study. At the same time, corresponding assumptions are set, and data are collected through questionnaires. Afterward, the Likert tenth scale was used to score questionnaire questions, and deep learning was conducted in conjunction with the model. The research shows that the model established in this study has good analytical results and has certain practical effects. It can provide suggestions for optimization and can provide theoretical references for subsequent research.
    Keywords: public satisfaction,smart city,non-profit organization,Machine Learning,AI-based Management,Artificial Intelligence,Machine learning
    Date: 2019–12–01
  5. By: Rasolomanana, Onjaniaina Mianin’Harizo
    Abstract: This paper presents an ensemble neural network using a small data set in the context of bankruptcy prediction. The individual models of the ensemble use different data of different types. We compare the performance of three neural network models: one using a single type of data, one using a combination of both data in a single data frame, and one using ensemble learning. The results show that the ensemble model outperformed the individual model and the combined model. This suggests that with scarce training data, especially when using different types of data, ensemble neural network can improve the level of prediction accuracy.
    Keywords: ensemble neural network, small dataset, combined data, bankruptcy prediction,
    Date: 2021–10
  6. By: Kilic, Talip; Hasanbasri, Ardina; Moylan, Heather; Koolwal, Gayatri
    Keywords: Labor and Human Capital, Research and Development/Tech Change/Emerging Technologies
    Date: 2021–08
  7. By: Gillmann, Niels; Kim, Alisa
    JEL: E00
    Date: 2021
  8. By: Jie Chen; Lingfei Li
    Abstract: We develop deep learning models to learn the hedge ratio for S&P500 index options directly from options data. We compare different combinations of features and show that a feedforward neural network model with time to maturity, Black-Scholes delta and a sentiment variable (VIX for calls and index return for puts) as input features performs the best in the out-of-sample test. This model significantly outperforms the standard hedging practice that uses the Black-Scholes delta and a recent data-driven model. Our results demonstrate the importance of market sentiment for hedging efficiency, a factor previously ignored in developing hedging strategies.
    Date: 2021–11
  9. By: Damian Kisiel; Denise Gorse
    Abstract: This work proposes a novel portfolio management technique, the Meta Portfolio Method (MPM), inspired by the successes of meta approaches in the field of bioinformatics and elsewhere. The MPM uses XGBoost to learn how to switch between two risk-based portfolio allocation strategies, the Hierarchical Risk Parity (HRP) and more classical Na\"ive Risk Parity (NRP). It is demonstrated that the MPM is able to successfully take advantage of the best characteristics of each strategy (the NRP's fast growth during market uptrends, and the HRP's protection against drawdowns during market turmoil). As a result, the MPM is shown to possess an excellent out-of-sample risk-reward profile, as measured by the Sharpe ratio, and in addition offers a high degree of interpretability of its asset allocation decisions.
    Date: 2021–11
  10. By: John M. Abowd; Joelle Abramowitz; Margaret C. Levenstein; Kristin McCue; Dhiren Patki; Trivellore Raghunathan; Ann M. Rodgers; Matthew D. Shapiro; Nada Wasi; Dawn Zinsser
    Abstract: This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents’ workplace characteristics.
    Date: 2021–11
  11. By: Sofia Samoili (European Commission - JRC); Montserrat Lopez Cobo (European Commission - JRC); Blagoj Delipetrev (European Commission - JRC); Fernando Martinez-Plumed (Universitat Politecnica de Valencia); Emilia Gomez (European Commission - JRC); Giuditta De Prato (European Commission - JRC)
    Abstract: We present here the second edition of our research aimed at establishing an operational definition of artificial intelligence (AI), to which we refer to in the activities of AI Watch. This edition builds on the first report, published in February 2020, and complements it with several recent developments. Since then, the European Commission has proposed a regulatory framework on artificial intelligence (AI Act) that establishes a legal definition of AI, which we incorporate in the current review. In addition to this legal definition, an operational definition is still needed to better delineate the boundaries and analysis of the AI Watch AI landscape. The proposed AI Watch operational definition consists of an iterative method providing a concise taxonomy and list of keywords that characterise the core domains of the AI research field, complemented by transversal topics such as AI applications or ethical and philosophical considerations - in line with the wider monitoring objective of AI Watch. The AI taxonomy is designed to inform the AI Watch AI landscape analysis and is also expected to cover applications of AI in closely related technological domains, such as robotics (in a broader sense), neuroscience or internet of things. The literature considered for the qualitative analysis of existing definitions and taxonomies has been enlarged to include recently published reports from the three complementary perspectives considered in this work: policy, research and industry. Therefore, the collection of definitions published between 1955 and 2021 and the summary of the main features of the concept of AI appearing in the relevant literature is another valuable output of this work. Finally, alternative approaches to study AI are also briefly presented in this new edition of the report. These include the classification of AI according to: families of algorithms and the theoretical models behind them; cognitive abilities reproduced by AI; functions performed by AI. Applications of AI may be grouped also according to other dimensions, like the economic sector in which such applications are found, or their business functions. These approaches, complementary to the taxonomy used for the analysis of the AI Watch international landscape, are useful to gain a wider understanding of the AI domain, and suitable to be used in studies related to these dimensions.
    Keywords: artificial intelligence, ai watch, ai definition, ai taxonomy, ai keywords
    Date: 2021–10
  12. By: RIGHI Riccardo (European Commission - JRC); LOPEZ COBO Montserrat (European Commission - JRC); SAMOILI Sofia (European Commission - JRC); CARDONA Melisande (European Commission - JRC); VAZQUEZ-PRADA BAILLET Miguel (European Commission - JRC); DE PRATO Giuditta (European Commission - JRC)
    Abstract: The brief presents the results of the AI worldwide ecosystem analysis for the period 2009-2020, by applying the Techno-Economic ecoSystem (TES) analytical approach. The TES approach allows to map the AI worldwide ecosystem by considering the main AI-related industrial, innovation and research activities, and all the economic players that are involved in them (i.e. firms, research institutes, governmental institutions). The brief analyses the position of the EU in the international context, via-à-vis the United States, China, and other main players in the landscape, in terms of size of the AI ecosystem, specialisation in AI areas, AI firms and AI R&D capacities. It follows with an in-depth analysis of the EU ecosystem, with a section devoted to the impact of EC-funded projects on the EU AI ecosystem.
    Keywords: artificial intelligence, ecosystem, ai firms, ai R&D
    Date: 2021–11
  13. By: Philippe Aghion (Harvard University [Cambridge]); Céline Antonin (OFCE - Observatoire français des conjonctures économiques - Sciences Po - Sciences Po); Simon Bunel
    Abstract: In this survey paper, we argue that the effects of artificial intelligence (AI) and automation on growth and employment depend to a large extent on institutions and policies. We develop a two‑fold analysis. In a first section, we survey the most recent literature to show that AI can spur growth by replacing labor by capital, both in the production of goods and services and in the production of ideas. Yet, we argue that AI may inhibit growth if combined with inappropriate competition policy. In a second section, we discuss the effect of robotization on employment in France over the 1994‑2014 period. Based on our empirical analysis on French data, we first show that robotization reduces aggregate employment at the employment zone level, and second that non‑educated workers are more negatively affected by robotization than educated workers. This finding suggests that inappropriate labor market and education policies reduce the positive impact that AI and automation could have on employment.
    Abstract: Dans cet article, nous défendons l'idée que les effets de l'intelligence artificielle (IA) et de l'automatisation sur la croissance et l'emploi dépendent pour une large part des institutions et des politiques. Notre analyse s'articule en deux temps. Dans une première partie, nous démontrons que l'IA peut stimuler la croissance en remplaçant la main‑d'oeuvre par du capital, tant en matière de production de biens et services que de production d'idées. Toutefois, nous soutenons que l'IA peut inhiber la croissance si elle est associée à une politique concurrentielle inadaptée. Dans une seconde partie, nous discutons l'effet de la robotisation sur l'emploi en France au cours de la période 1994‑2014. D'après notre analyse empirique sur données françaises, nous démontrons premièrement que la robotisation réduit l'emploi global au niveau des zones d'emploi, et deuxièmement que les travailleurs ayant un faible niveau d'éducation sont davantage pénalisés par la robotisation que les travailleurs ayant un fort niveau d'éducation. Ce constat suggère que des politiques inadaptées en matière de marché du travail et d'éducation réduisent l'impact positif que l'IA et l'automatisation pourraient avoir sur l'emploi.
    Keywords: Artificial intelligence,Growth,Automation,Robots,Employment,Intelligence artificielle,Croissance,Automatisation,Emploi
    Date: 2019–12
  14. By: Alessandro Bitetto (University of Pavia); Stefano Filomeni (University of Essex); Michele Modina (University of Molise)
    Abstract: Recent evidence highlights the importance of hybrid credit scoring models to evaluate borrowers’ creditworthiness. However, the current hybrid models neglect to consider the role of public-peer market information in addition to accounting information on default prediction. This paper aims to fill this gap in the literature by providing novel evidence on the impact of market information in predicting corporate defaults for unlisted firms. We employ a sample of 10,136 Italian micro-, small-, and mid-sized enterprises (MSMEs) that borrow from 113 cooperative banks from 2012–2014 to examine whether market pricing of public firms adds additional information to accounting measures in predicting default of private firms. Specifically, we estimate the probability of default (PD) of MSMEs using equity price of size-and industry- matched public firms, and then we adopt advanced statistical techniques based on parametric algorithm (Multivariate Adaptive Regression Spline) and non-parametric machine learning model (Random Forest). Moreover, by using Shapley values, we assess the relevance of market information in predicting corporate credit risk. Firstly, we show the predictive power of Merton’s PD on default prediction for unlisted firms. Secondly, we show the increased predictive power of credit risk models that consider both the Merton’s PD and accounting information to assess corporate credit risk. We trust the results of this paper contribute to the current debate on safeguarding the continuity and the resilience of the banking sector. Indeed, banks’ hybrid credit scoring methodologies that also embed market information prove to be successful to assess credit risk of unlisted firms and could be useful for forward-looking financial risk management frameworks
    Keywords: Default Risk, Distance to Default, Machine Learning, Merton model, SME, PD, SHAP, Autoencoder, Random Forest, XAI
    JEL: C52 C53 D82 D83 G21 G22
    Date: 2021–10
  15. By: Kajal Lahiri; Cheng Yang
    Abstract: We forecast New York state tax revenues with a mixed-frequency model using a number of machine learning techniques. We found boosting with two dynamic factors extracted from a select list of New York and U.S. leading indicators did best in terms of correctly updating revenues for the fiscal year in direct multi-step out-of-sample forecasts. These forecasts were found to be informationally efficient over 18 monthly horizons. In addition to boosting with factors, we also studied the advisability of restricting boosting to select the most recent macro variables to capture abrupt structural changes. Since the COVID-19 pandemic upended all government budgets, our boosted forecasts were used to monitor revenues in real time for the fiscal year 2021. Our estimates showed a drastic year-over-year decline in real revenues by over 16% in May 2020, followed by several upward nowcast revisions that led to a recovery to -1% in March 2021, which was close to the actual annual value of -1.6%.
    Keywords: revenue forecasting, machine learning, real time forecasting, mixed frequency, fiscal policy
    JEL: C22 C32 C50 C53 E62
    Date: 2021
    Abstract: In this article we estimate the determinants of broadband penetration in Europe. We use data from the European Innovation Scoreboard of the European Commission for 37 countries in the period 2010-2019. We apply Panel Data with Fixed Effects, Panel Data with Random Effects, WLS, OLS and Dynamic Panel. We found that the level of “Broadband Penetration” in Europe is positively associated to “Enterprises Providing ICT Training”, “Innovative Sales Share”, “Intellectual Assets”, “Knowledge-Intensive Service Exports”, “Turnover Share SMEs”, “Innovation Friendly Environment” and negatively associated with “Government procurement of advanced technology products”, “Sales Impact”, “Firm Investments”, “Opportunity-Driven Entrepreneurship”, “Most Cited Publications”, “Rule of Law”. In adjunct we perform a clusterization with k-Means algorithm optimized with the Silhouette Coefficient and we find the presence of three different clusters. Finally, we apply eight machine learning algorithms to predict the level of “Broadband Penetration” in Europe and we find that the Polynomial Regression algorithm is the best predictor and that the level of the variable is expected to increase of 10,4%.
    Keywords: General; Innovation and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Technological Change: Choices and Consequences; Intellectual Property and Intellectual Capital.
    JEL: O30 O31 O32 O33 O34
    Date: 2021–10–31
  17. By: Jennifer Elena Feichtmayer; Klaus Gründler
    Abstract: Individuals often hold erroneous beliefs about their socio-economic status relative to others. We develop a new machine learning technique to measure these misperceptions and use large-scale international survey data to compute status misperception for 241,757 households from 97 countries (24 OECD, 73 non-OECD). We show that status misperception is a widespread phenomenon across the globe. Upward-biased perceptions are associated with lower preferences for redistribution and have direct consequences for welfare provision via the tax and transfer system. The effect accounts for approximately 9% of the variation in redistribution preferences, is independent of socio-demographic characteristics, robust to measurement errors in social surveys, and occurs similary when we change the underlying micro data or examine party preferences.
    Keywords: misperceptions, machine learning, socio-economic status, preferences, redistribution, welfare provision, taxes and transfers
    JEL: D31 H53 I30 C43
    Date: 2021
  18. By: Xiaofei Shi; Daran Xu; Zhanhao Zhang
    Abstract: This work studies the optimal hedging problems in frictional markets with general convex transaction costs on the trading rates. We show that, under the smallness assumption on the magnitude of the transaction costs, the leading order approximation of the optimal trading speed can be identified through the solution to a nonlinear SDE. Unfortunately, models with arbitrary state dynamics generally lead to a nonlinear forward-backward SDE system, where wellposedness results are unavailable. However, we can numerically find the optimal trading strategy with the modern development of deep learning algorithms. Among various deep learning structures, the most popular choices are the FBSDE solver introduced in the spirit by [32] and the deep hedging algorithm pioneered by [12, 14, 15, 16, 35, 36, 45, 47]. We implement these deep learning algorithms with calibrated parameters from [26] and compare the numerical results with the leading order approximations. This work documents the performance of different learning-based algorithms and provides better understandings of their advantages and drawbacks.
    Date: 2021–11
  19. By: Yuga Iguchi (MUFG Bank); Riu Naito (Japan Post Insurance and Hitotsubashi University); Yusuke Okano (SMBC Nikko Securities); Akihiko Takahashi (Faculty of Economics, The University of Tokyo); Toshihiro Yamada (Graduate School of Economics, Hitotsubashi University and Japan Science and Technology Agency (JST))
    Abstract: The paper proposes a new computational scheme for diffusion semigroups based on an asymptotic expansion with weak approximation and deep learning algorithm to solve high- dimensional Kolmogorov partial differential equations (PDEs). In particular, we give a spatial approximation for the solution of d-dimensional PDEs on a range [a,b]d without suffering from the curse of dimensionality.
    Date: 2021–11
  20. By: Yuga Iguchi (MUFG Bank, Tokyo, Japan & UCL London, UK); Riu Naito (Japan Post Insurance & Hitotsubashi University, Tokyo, Japan); Yusuke Okano (SMBC Nikko Securities, Tokyo, Japan); Akihiko Takahashi (University of Tokyo, Tokyo, Japan); Toshihiro Yamada (Hitotsubashi University & JST, Tokyo, Japan)
    Abstract: The paper proposes a new computational scheme for diffusion semigroups based on an asymptotic expansion with weak approximation and deep learning algorithm to solve highdimensional Kolmogorov partial differential equations (PDEs). In particular, we give a spatial approximation for the solution of d-dimensional PDEs on a range [a, b]d without suffering from the curse of dimensionality.
    Date: 2021–11
  21. By: Raphael Ntentas
    Abstract: At this juncture of human history populism is ubiquitous and Greek politics constitute no exception. This paper sheds light on a methodology that quantifies political populism (i.e. parliamentary populist rhetoric) in Greece through a novel textual dataset, which includes 16.5 years filled with heated debates over times of economic peaks and valleys. Combining computer with human intelligence to identify populism based upon a creative dictionary and strict definitional guidelines that fit the Hellenic ParliamentÕs context, helps one explore perspectives unimagined just a few years ago. Besides, as Greece has gone through a series of sharp, intense and generalized socio-economic shocks, this paper uses an OLS multiple regression analysis to test whether there is a link between economic insecurity and political populism. Ultimately, it provides empirical evidence on a weak link, indicating economic insecurityÕs minimal role in explaining the variation in political populism levels. Our results do offer some tentative insights into how political populism evolves in the country during 2004-2020, confirming the previous empirical finding that assigns higher levels of populism to December when heated parliamentary debates on the following yearÕs budget occur. Lastly, the empirical results indicate that populism does not intensify in conditions of crises, in alignment with the findings of some of the latest cross-national studies.
    Keywords: Populism, Hellenic Parliament, Economic Insecurity, Big Data, Quantitative Text Analysis, Multiple Regression Analysis
    Date: 2021–11
  22. By: Gregor Dorfleitner; Lars Hornuf; Julia Kreppmeier
    Abstract: This article analyzes how the General Data Protection Regulation (GDPR) has affected the privacy practices of FinTech firms. We study the content of 308 privacy statements respectively before and after the GDPR became binding. Using textual analysis methods, we find that the readability of the privacy statements has decreased. The texts of privacy statements have become longer and use more standardized language, resulting in worse user comprehension. This calls into question whether the GDPR has achieved its original goal—the protection of natural persons regarding the processing of personal data. We also analyze the content of privacy statements and link it to company- and industry-specific determinants. Before the GDPR became binding, more external investors and a higher legal capital were related to a higher quantity of data processed and more transparency, but not thereafter. Finally, we document mimicking behavior among industry peers with regard to the data processed and transparency.
    Keywords: data privacy, FinTech, General Data Protection Regulation, privacy statement, textual analysis, financial technology
    JEL: K20 L81
    Date: 2021
    Abstract: The growth of digital platform markets in ASEAN and India is prominent. With COVID-19, demands for economic and social activities centered on digital platforms are expected to rise further; especially five sectors (e-commerce, sharing economy, education, healthcare and fintech) are fast growing seectors. Korean is a potential partner of ASEAN countries and India. Korea's Digital New Deal policy now stresses tasks such as sharing and utilizing data, convergence of 5G and artificial intelligence across whole industries, spreading digital education, digital healthcare, etc., which are closely related to the economic and social needs of ASEAN countries and India. In order to promote regulatory harmonization and cooperation with ASEAN and India, it is necessary for Korea to promote digital economy and trade agreements.
    Keywords: ASEAN; India; Korea; digital platform; Digital New Deal policy
    Date: 2021–06–11
  24. By: Nicolaj S{\o}ndergaard M\"uhlbach
    Abstract: We propose \textbf{occupation2vec}, a general approach to representing occupations, which can be used in matching, predictive and causal modeling, and other economic areas. In particular, we use it to score occupations on any definable characteristic of interest, say the degree of `greenness'. Using more than 17,000 occupation-specific descriptors, we transform each occupation into a high-dimensional vector using natural language processing. Similar, we assign a vector to the target characteristic and estimate the occupational degree of this characteristic as the correlation between the vectors. The main advantages of this approach are its universal applicability and verifiability contrary to existing ad-hoc approaches. We extensively validate our approach on several exercises and then use it to estimate the occupational degree of charisma and emotional intelligence (EQ). We find that occupations that score high on these tend to have higher educational requirements and projected employment growth. Turning to wages, highly charismatic occupations are either found in the lower or upper tail in the wage distribution. This is not found for EQ, where higher levels of EQ are correlated with higher wages.
    Date: 2021–11
  25. By: USUKI Teppei; KONDO Satoshi; SHIRAKI Kengo; MASADA Takahiro; SUZAKI Kosuke; MIYAKAWA Daisuke
    Abstract: In this paper, we implement anomaly detection on listed firms' accounting items. Using a type of sparse modeling, i.e., Graphical Lasso, we confirm that our accounting fraud detection has achieved a practically admissible level of detection capability. We also find that the method of sparse modeling contributes to detection capability.
    Date: 2021–10
  26. By: Munday, Tim (University of Oxford); Brookes, James (Bank of England)
    Abstract: We ask how central banks can change their communication in order to receive greater newspaper coverage. We write down a model of news production and consumption in which news generation is endogenous because the central bank must draft its communication in such a way that newspapers choose to report it, while still retaining the message the central bank wishes to convey to the public. We use our model to show that standard econometric techniques that correlate central bank text with measures of news coverage in order to determine what causes central bank communication to be reported on will likely prove to be biased. We use techniques from computational linguistics combined with an event-study methodology to measure the extent of news coverage a central bank communication receives, and the textual features that might cause a communication to be more (or less) likely to be considered newsworthy. We consider the case of the Bank of England, and estimate the relationship between news coverage and central bank communication implied by our model. We find that the interaction between the state of the economy and the way in which the Bank of England writes its communication is important for determining news coverage. We provide concrete suggestions for ways in which central bank communication can increase its news coverage by improving readability in line with our results.
    Keywords: Central bank communication; print media; high-dimensional estimation; natural language processing
    JEL: C01 C55 C82 E43 E52 E58
    Date: 2021–10–27

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.