nep-big 2023-03-20 papers

on Big Data

Issue of 2023‒03‒20
23 papers chosen by
Tom Coupé
University of Canterbury

A cross-verified database of notable people, 3500BC-2018AD By Morgane Laouenan; Palaash Bhargava; Jean-Benoît Eyméoud; Olivier Gergaud; Guillaume Plique; Etienne Wasmer
Forecasting realized volatility in turbulent times using temporal fusion transformers By Frank, Johannes
FEDERAL STATE BUDGETARY EDUCATIONAL INSTITUTION OF HIGHER EDUCATION By GRACHEVA V.A.; PETROVA D.A.
The supply, demand and characteristics of the AI workforce across OECD countries By Andrew Green; Lucas Lamby
Using supervised machine learning to scale human‐coded data: A method and dataset in the board leadership context By Harrison, Joseph S.; Josefy, Matthew A.; Kalm, Matias; Krause, Ryan
Age, wealth, and the MPC in Europe: A supervised machine learning approach By Dutt, Satyajit; Radermacher, Jan W.
ddml: Double/Debiased Machine Learning in Stata By Ahrens, Achim; Hansen, Christian B.; Schaffer, Mark E; Wiemann, Thomas
Mass Valuation of Real Estate Using GIS-based Nominal Valuation and Machine Learning Methods By Muhammed Oguzhan Mete; Tahsin Yomralioglu
Six questions about the demand for artificial intelligence skills in labour markets By Fabio Manca
ECONOMIC FOUNDATIONS OF COMPETITION POLICY FOR DIGITAL ECOSYSTEMS By N. G. Korol; A. A. Kurdin; A. A. Morosanova
Modelling Subjective Attractiveness By Konrad Lewszyk; Piotr Wójcik
Fourth Industrial Revolution and Evolution of Data Science: Challenges for Official Statistics By Popoola, Osuolale Peter; Adeboye, Olawale Nureni
Gender differences in reference letters: Evidence from the Economics job market By Markus Eberhardt; Giovanni Facchini; Valeria Rueda
Data Driven Contagion Risk Management in Low-Income Countries using Machine Learning Applications with COVID-19 in South Asia By Abu S. Shonchoy; Moogdho M. Mahzab; Towhid I. Mahmood; Manhal Ali
Stock Broad-Index Trend Patterns Learning via Domain Knowledge Informed Generative Network By Jingyi Gu; Fadi P. Deek; Guiling Wang
Predicting Firm Exits with Machine Learning: Implications for Selection into COVID-19 Support and Productivity Growth By Lily Davies; Mark Kattenberg; Benedikt Vogt
Mamma Mia! Revealing hidden heterogeneity by PCA-biplot: MPC puzzle for Italy's elderly poor By Radermacher, Jan W.
What Makes a Program Good? Evidence from Short-Cycle Higher Education Programs in Five Developing Countries By Marina Bassi; Lelys Dinarte-Diaz; Maria Marta Ferreyra; Sergio Urzua
The Transformation of Public Policy Analysis in Times of Crisis – A Microsimulation-Nowcasting Method Using Big Data By O'Donoghue, Cathal; Sologon, Denisa Maria
Sentiment Spin: Attacking Financial Sentiment with GPT-3 By Markus Leippold
Extraterrestrial Artificial Intelligence: The Final Existential Risk? By Naudé, Wim
THE RELEVANT SCIENTIFIC AND METHODOLOGICAL APPROACHES TO ENSURING FUNDAMENTAL HUMAN RIGHTS IN DATA PROCESSING IN PUBLIC ADMINISTRATION By Talapina Elvira V.; Yuzhakov Vladimir N.; Chereshneva Irina A.
PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets By Shuo Sun; Molei Qin; Xinrun Wang; Bo An

A cross-verified database of notable people, 3500BC-2018AD

By:	Morgane Laouenan (CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po, CNRS - Centre National de la Recherche Scientifique); Palaash Bhargava (Department of Economics Columbia University - Columbia University [New York]); Jean-Benoît Eyméoud (LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po); Olivier Gergaud (LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po); Guillaume Plique (médialab - médialab (Sciences Po) - Sciences Po - Sciences Po, Kedge BS - Kedge Business School); Etienne Wasmer (New York University [Abu Dhabi] - NYU - NYU System, LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po)
Abstract:	A new strand of literature aims at building the most comprehensive and accurate database of notable individuals. We collect a massive amount of data from various editions of and . Using deduplication techniques over these partially overlapping sources, we cross-verify each retrieved information. For some variables, adds 15% more information when missing in . We find very few errors in the part of the database that contains the most documented individuals but nontrivial error rates in the bottom of the notability distribution, due to sparse information and classification errors or ambiguity. Our strategy results in a cross-verified database of 2.29 million individuals (an elite of 1/43, 000 of human being having ever lived), including a third who are not present in the English edition of . Data collection is driven by specific social science questions on gender, economic growth, urban and cultural development. We document an Anglo-Saxon bias present in the English edition of , and document when it matters and when not.
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:hal:spmain:hal-03930666&r=big

Forecasting realized volatility in turbulent times using temporal fusion transformers

By:	Frank, Johannes
Abstract:	This paper analyzes the performance of temporal fusion transformers in forecasting realized volatilities of stocks listed in the S&P 500 in volatile periods by comparing the predictions with those of state-of-the-art machine learning methods as well as GARCH models. The models are trained on weekly and monthly data based on three different feature sets using varying training approaches including pooling methods. I find that temporal fusion transformers show very good results in predicting financial volatility and outperform long short-term memory networks and random forests when using pooling methods. The use of sectoral pooling substantially improves the predictive performance of all machine learning approaches used. The results are robust to different ways of training the models.
Keywords:	Realized volatility, temporal fusion transformer, long short-term memory network, random forest
JEL:	C45 C53 C58 E44
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:zbw:iwqwdp:032023&r=big

FEDERAL STATE BUDGETARY EDUCATIONAL INSTITUTION OF HIGHER EDUCATION

By:	GRACHEVA V.A. (The Russian Presidential Academy Of National Economy And Public Administration); PETROVA D.A. (The Russian Presidential Academy Of National Economy And Public Administration)
Abstract:	The Internet is a public source of information, where information can be found at minimum search cost. Social media are becoming increasingly popular among web users trying to find and analyze information about the current economic situation. Web users get the opportunity to exchange views or discuss various issues in the news communities of social networks. This information can be used by economic agents to make decisions. Thus, the study of user behavior in social networks makes it possible to identify the expectations and preferences of economic agents. The goal of this study is to assess the expectations and sentiments of economic agents based on textual analysis of social media data. The study addresses the following objectives: Analysis of the mechanisms of influence of the information dissemination and networking effects on the behavior of economic agents; Systematization of the results of theoretical and empirical analysis of the economic agents’ expectations; An overview of machine learning methods used in text processing; Development of an algorithm for identifying sources of information for web scraping and rules for selecting text information to create a body of posts and comments; Collecting a database and preparing posts and comments for text analysis; Application of topic modeling to the identification of topics and keywords in social media data; Assessment of high-frequency indicators of the public sentiment. The subject of the research is a quantitative assessment of the sentiment of web users based on Russian data. The novelty of the study is the assessment of inflation expectations, sentiments in the foreign exchange market and indices of economic conditions using structured and unstructured internet data. Methods: topic modeling; machine learning methods and econometric methods of time series analysis. The study is based on data for Russia in 2014-2021. The study shows that social media posts, search queries and online news articles can be good proxy variables for the economic agents’ expectations. We construct three types of public confidence indicators based on internet data: inflation expectations; sentiment in the foreign exchange market and index of economic conditions. The results of econometric analysis indicate that the quality of macroeconomic performance models with sentiment indicators is higher than without these indicators. Additionally, indicators based on VK posts, RBC news articles and Google Trends search queries are more informative compared to comments. The main conclusion of the study is that internet data can improve the quality of macroeconomic performance models. In a further study, we plan to expand the list of indicators of the sentiment of economic agents and to evaluate advanced time series models.
Keywords:	textual analysis; machine learning; inflation expectations; sentiment of economic agents; internet data; topic modeling; index of economic conditions; social networks; search queries.
URL:	http://d.repec.org/n?u=RePEc:rnp:wpaper:w2022057&r=big

The supply, demand and characteristics of the AI workforce across OECD countries

By:	Andrew Green; Lucas Lamby
Abstract:	This report provides representative, cross-country estimates of the artificial intelligence (AI) workforce across OECD countries. The AI workforce is defined as the subset of workers with skills in statistics, computer science and machine learning who could actively develop and maintain AI systems. For countries that wish to be at the forefront of AI development, understanding the AI workforce is crucial to building and nurturing a talent pipeline, and ensuring that those who create AI reflect the diversity of society. This report uses data from online job vacancies to measure the within-occupation intensity of AI skill demand. The within-occupation AI intensity is then weighted to employment by occupation in labour force surveys to provide estimates of the size and growth of the AI workforce over time.
Keywords:	Artificial Intelligence
JEL:	J21 J23 J24 J31 J44
Date:	2023–02–23
URL:	http://d.repec.org/n?u=RePEc:oec:elsaab:287-en&r=big

Using supervised machine learning to scale human‐coded data: A method and dataset in the board leadership context

By:	Harrison, Joseph S.; Josefy, Matthew A.; Kalm, Matias (Tilburg University, School of Economics and Management); Krause, Ryan
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:tiu:tiutis:abc9f83d-960e-40c5-ae40-3845302531de&r=big

Age, wealth, and the MPC in Europe: A supervised machine learning approach

By:	Dutt, Satyajit; Radermacher, Jan W.
Abstract:	We investigate consumption patterns in Europe with supervised machine learning methods and reveal differences in age and wealth impact across countries. Using data from the third wave (2017) of the Eurosystem's Household Finance and Consumption Survey (HFCS), we assess how age and (liquid) wealth affect the marginal propensity to consume (MPC) in the Netherlands, Germany, France, and Italy. Our regression analysis takes the specification by Christelis et al. (2019) as a starting point. Decision trees are used to suggest alternative variable splits to create categorical variables for customized regression specifications. The results suggest an impact of differing wealth distributions and retirement systems across the studied Eurozone members and are relevant to European policy makers due to joint Eurozone monetary policy and increasing supranational fiscal authority of the EU. The analysis is further substantiated by a supervised machine learning analysis using a random forest and XGBoost algorithm.
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:zbw:safewp:383&r=big

ddml: Double/Debiased Machine Learning in Stata

By:	Ahrens, Achim (Economic and Social Research Institute, Dublin); Hansen, Christian B. (University of Chicago); Schaffer, Mark E (Heriot-Watt University, Edinburgh); Wiemann, Thomas (University of Chicago)
Abstract:	We introduce the package ddml for Double/Debiased Machine Learning (DDML) in Stata. Estimators of causal parameters for five different econometric models are supported, allowing for flexible estimation of causal effects of endogenous variables in settings with unknown functional forms and/or many exogenous variables. ddml is compatible with many existing supervised machine learning programs in Stata. We recommend using DDML in combination with stacking estimation which combines multiple machine learners into a final predictor. We provide Monte Carlo evidence to support our recommendation.
Keywords:	st0001, causal inference, machine learning, doubly-robust estimation
JEL:	C14 C21 C87
Date:	2023–02
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp15963&r=big

Mass Valuation of Real Estate Using GIS-based Nominal Valuation and Machine Learning Methods

By:	Muhammed Oguzhan Mete; Tahsin Yomralioglu
Abstract:	Geographic Information Systems (GIS) and Machine Learning methods are widely used in mass real estate valuation practices. Focusing on the physical attributes of properties, locational criteria are insufficiently used during the price prediction process. Whereas, locational criteria like proximity to important places, sea or forest views, flat topography are some of the spatial factors that extremely affect the real estate value. In this study, a hybrid approach is developed by integrating GIS and Machine Learning for automated mass valuation of residential properties in Turkey and the United Kingdom. GIS-based Nominal Valuation Method was applied to produce a land value map by carrying out proximity, terrain, and visibility analyses. Besides, ensemble regression methods like XGBoost, CatBoost, LightGBM, and Random Forest are built for price prediction. Spatial criteria scores obtained from GIS analyses were included in the price prediction data for feature enrichment purpose. Results showed that adding locational factors to the real estate price data increased the prediction accuracy dramatically. It also demonstrated that Random Forest was the most successful regression model compared to other ensemble methods.
Keywords:	GIS; Machine Learning; Mass Valuation; Real Estate Valuation
JEL:	R3
Date:	2022–01–01
URL:	http://d.repec.org/n?u=RePEc:arz:wpaper:2022_177&r=big

Six questions about the demand for artificial intelligence skills in labour markets

By:	Fabio Manca
Abstract:	This study responds to six key questions about the impact that the demand for Artificial Intelligence (AI) skills is having on labour markets. What are the occupations where AI skills are most relevant? How do different AI-relevant skills combine in job requirements? How quickly is the demand for AI-related skills diffusing across labour markets and what is the relationship between AI skill demands and the demand for cognitive skills across jobs? Finally, are AI skills leading to a wage premium and how different are the wage returns associated with AI and routine skills? To shed light on these aspects, this study leverages Natural Language Processing (NLP) algorithms to analyse the information contained in millions of job postings collected from the internet.
Keywords:	artificial intelligence, education, labour market, skills, technology
JEL:	I26 J01 O14 O33
Date:	2023–02–23
URL:	http://d.repec.org/n?u=RePEc:oec:elsaab:286-en&r=big

ECONOMIC FOUNDATIONS OF COMPETITION POLICY FOR DIGITAL ECOSYSTEMS

By:	N. G. Korol (The Russian Presidential Academy Of National Economy And Public Administration); A. A. Kurdin (The Russian Presidential Academy Of National Economy And Public Administration); A. A. Morosanova (The Russian Presidential Academy Of National Economy And Public Administration)
Abstract:	Digital transformation of industries and markets remains a key challenge for the modern competition policy. Digital ecosystems are playing increasingly important roles in the structure of the economy. That is why their regulation is becoming an especially relevant problem for antitrust regulators. The goal of this preprint is to identify key specific factors of competition restraints on markets for goods and services in the spheres of digital ecosystems functioning. The authors of the research aggregate and compare main concepts and models of digital ecosystems with a focus on procompetitive and anticompetitive factors of their activities. The authors also summarize main issues raised in the process of market behavior qualification and market structure assessment for artificial intelligence (AI) intensive companies (ecosystem leaders). These issues include enhanced market concentration, risks of price discrimination and algorithmic collusion. The main research method in this regard is the legal and economic analysis, which is based on the economic assessment of Russian and foreign legal documents. The specific challenge in that sphere is the dependence of AI efficiency on big-data-based machine learning. This feature causes an increase in market concentration, strengthens the positions of market leaders, and potentially weakens the competitive environment. The results of the research include the systematization of digital ecosystem concepts, the detection of main factors for their modeling and the identification of presumptions and consequences of the modernization of antitrust regulation. Antitrust bodies are recommended to improve their own digital competencies and analytical capabilities to prevent the loss of control over the market, as well as the elimination of AI benefits.
Keywords:	competition policy, antitrust policy, ecosystem, digital economy, platform, network effects, artificial intelligence, machine learning
URL:	http://d.repec.org/n?u=RePEc:rnp:wpaper:w2022049&r=big

Modelling Subjective Attractiveness

By:	Konrad Lewszyk (University of Warsaw, Faculty of Economic Sciences and Data Science Lab WNE UW); Piotr Wójcik (University of Warsaw, Faculty of Economic Sciences and Data Science Lab WNE UW)
Abstract:	Attractive people obtain greater economic and reproductive success. This article attempts to grasp individual preferences of facial attractiveness and create reliable models that will accurately predict a beauty score on a binary and quintary scale. Based on extensive research conducted on factors of attractiveness, we derive the most important facial features that have the highest impact in beauty perception. Based on a sample of 681 images of faces using facial a landmark detector. We derive various numerical features represented by face characteristics and. The application of various machine learning algorithms shows that attractiveness can be predicted accurately based on facial characteristics. In addition, we show that indeed the attractiveness is subjective as the same features have different importance for different subjects.
Keywords:	Attractiveness, beauty-premium, image processing, machine learning, predictive models
JEL:	C40 C53 J71
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:war:wpaper:2023-06&r=big

Fourth Industrial Revolution and Evolution of Data Science: Challenges for Official Statistics

By:	Popoola, Osuolale Peter; Adeboye, Olawale Nureni
Abstract:	Fourth Industrial Revolution is describes as exponential growth of several key technological fields’ concepts, such as intelligent materials, cloud computing, cyber-physical systems, data exchange, the Internet of things and blockchain technology. At its core, data represents a post-industrial opportunity. The effects of technologies have provided new avenues of data for official statistics, which can then be harnessed through the power of data science. However, as data continue to grow in size and complexity; new algorithms need to be developed so as to learn from diverse data sources. The limitation of conventional statistics in managing and analyzing big data has inspired data analysts to venture into data science. Data Science is a combination of multiple disciplines that use statistics, data analysis, and machine learning to analyze data, and extract knowledge and insights from it. These swathes of new digital data are valuable for official statistics. This paper links industrial eras to the evolution of statistics and data; it examines the emergence of big data and data science, what it means, it benefits and challenges for official statistics
Keywords:	Industrial Eras, Data Evolution, Big Data Revolution, Data Science, Official Statistics
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:zbw:esrepo:268717&r=big

Gender differences in reference letters: Evidence from the Economics job market

By:	Markus Eberhardt; Giovanni Facchini; Valeria Rueda
Abstract:	Academia, and economics in particular, faces increased scrutiny because of gender imbalance. This paper studies the job market for entry-level faculty positions. We employ machine learning methods to analyze gendered patterns in the text of 12, 000 reference letters written in support of over 3, 700 candidates. Using both supervised and unsupervised techniques, we document widespread differences in the attributes emphasized. Women are systematically more likely to be described using ‘grindstone’ terms and at times less likely to be praised for their ability. Using information on initial placement we highlight the mplications of these gendered descriptors for the quality of academic placement.
Keywords:	gender; natural language processing; diversity
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:not:notgep:2023-02&r=big

Data Driven Contagion Risk Management in Low-Income Countries using Machine Learning Applications with COVID-19 in South Asia

By:	Abu S. Shonchoy (Department of Economics, Florida International University); Moogdho M. Mahzab (Stanford University); Towhid I. Mahmood (Texas Tech University); Manhal Ali (University of Leeds)
Abstract:	In the absence of real-time surveillance data, it is difficult to derive an early warning system and potential outbreak locations with the existing epidemiological models, especially in resource-constrained countries. We proposed a Contagion Risk Index (CR-Index) - based on publicly available national statistics â€“ founded on communicable disease spreadability vectors. Utilizing the daily COVID-19 data (positive cases and deaths) from 2020-2022, we developed country-specific and sub-national CR-Index for South Asia (India, Pakistan, and Bangladesh) and identified potential infection hotspots-aiding policymakers with efficient mitigation planning. Across the study period, the week-by-week and fixed-effects regression estimates demonstrate a strong correlation between the proposed CR-Index and sub-national (district-level) COVID-19 statistics. We validated the CR-Index using machine learning methods by evaluating the out-of-sample predictive performance. Machine learning driven validation showed that the CR-Index can correctly predict districts with high incidents of COVID-19 cases and deaths more than 85% of the time. This proposed CR-Index is a simple, replicable, and easily interpretable tool that can help low-income countries prioritize resource mobilization to contain the disease spread and associated crisis management with global relevance and applicability. This index can also help to contain future pandemics (and epidemics) and manage their far-reaching adverse consequences.
Date:	2023–02
URL:	http://d.repec.org/n?u=RePEc:fiu:wpaper:2302&r=big

Stock Broad-Index Trend Patterns Learning via Domain Knowledge Informed Generative Network

By:	Jingyi Gu; Fadi P. Deek; Guiling Wang
Abstract:	Predicting the Stock movement attracts much attention from both industry and academia. Despite such significant efforts, the results remain unsatisfactory due to the inherently complicated nature of the stock market driven by factors including supply and demand, the state of the economy, the political climate, and even irrational human behavior. Recently, Generative Adversarial Networks (GAN) have been extended for time series data; however, robust methods are primarily for synthetic series generation, which fall short for appropriate stock prediction. This is because existing GANs for stock applications suffer from mode collapse and only consider one-step prediction, thus underutilizing the potential of GAN. Furthermore, merging news and market volatility are neglected in current GANs. To address these issues, we exploit expert domain knowledge in finance and, for the first time, attempt to formulate stock movement prediction into a Wasserstein GAN framework for multi-step prediction. We propose IndexGAN, which includes deliberate designs for the inherent characteristics of the stock market, leverages news context learning to thoroughly investigate textual information and develop an attentive seq2seq learning network that captures the temporal dependency among stock prices, news, and market sentiment. We also utilize the critic to approximate the Wasserstein distance between actual and predicted sequences and develop a rolling strategy for deployment that mitigates noise from the financial market. Extensive experiments are conducted on real-world broad-based indices, demonstrating the superior performance of our architecture over other state-of-the-art baselines, also validating all its contributing components.
Date:	2023–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2302.14164&r=big

Predicting Firm Exits with Machine Learning: Implications for Selection into COVID-19 Support and Productivity Growth

By:	Lily Davies (CPB Netherlands Bureau for Economic Policy Analysis); Mark Kattenberg (CPB Netherlands Bureau for Economic Policy Analysis); Benedikt Vogt (CPB Netherlands Bureau for Economic Policy Analysis)
Abstract:	Evaluations of support measures for companies often require a good assessment of the viability of firms or the probability that a firm will exit the market. On March 17, 2020, a lockdown and associated social-restriction measures were announced, which hit specific in the economy severely. To compensate companies and the self-employed for the loss of income, an extensive package of support measures has been designed. These support measures had hardly any restrictions, because they had to be paid out quickly. This raises the question whether unhealthy companies have made disproportionate use of support and to what extent these support measures have kept viable or non-viable companies afloat. In this paper, we use machine learning techniques to predict whether a company would have left the market in a world without corona. These predictions show that unhealthy companies applied for support less often than healthy companies. But we also show that the COVID-19 support has prevented most exits among unhealthy companies. This indicates that the corona support measures have had a negative impact on productivity growth.
JEL:	C18 E61 E65 G33
Date:	2023–03
URL:	http://d.repec.org/n?u=RePEc:cpb:discus:444&r=big

Mamma Mia! Revealing hidden heterogeneity by PCA-biplot: MPC puzzle for Italy's elderly poor

By:	Radermacher, Jan W.
Abstract:	I investigate consumption patterns in Italy and use a PCA-biplot to discover a consumption puzzle for the elderly poor. Data from the third wave (2017) of the Eurosystem's Household Finance and Consumption Survey (HFCS) indicate that Italian poor old-aged households boast lower levels of the marginal propensity to consume (MPC) than suggested by the dominant consumption models. A customized regression analysis exhibits group differences with richer peers to be only half as large as prescribed by a traditional linear regression model. This analysis has benefited from a visualization technique for high-dimensional matrices related to the unsupervised machine learning literature. I demonstrate that PCA-biplots are a useful tool to reveal hidden relations and to help researchers to formulate simple research questions. The method is presented in detail and suggestions on incorporating it in the econometric modeling pipeline are given.
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:zbw:safewp:382&r=big

What Makes a Program Good? Evidence from Short-Cycle Higher Education Programs in Five Developing Countries

By:	Marina Bassi; Lelys Dinarte-Diaz; Maria Marta Ferreyra; Sergio Urzua
Abstract:	Short-cycle higher education programs (SCPs) can play a central role in skill development and higher education expansion, yet their quality varies greatly within and among countries. In this paper we explore the relationship between programs’ practices and inputs (quality determinants) and student academic and labor market outcomes. We design and conduct a novel survey to collect program-level information on quality determinants and average outcomes for Brazil, Colombia, Dominican Republic, Ecuador, and Peru. Categories of quality determinants include training and curriculum, infrastructure, faculty, link with productive sector, costs and funding, and practices on student admission and institutional governance. We also collect administrative, student-level data on higher education and formal employment for SCP students in Brazil and Ecuador and match it to survey data. Using machine learning methods, we select the quality determinants that predict outcomes at the program and student levels. Estimates indicate that some quality determinants may favor academic and labor market outcomes while others may hinder them. Two practices predict improvements in all labor market outcomes in Brazil and Ecuador—teaching numerical competencies and providing job market information—and one practice—teaching numerical competencies—additionally predicts improvements in labor market outcomes for all survey countries. Since quality determinants account for 20-40 percent of the explained variation in student-level outcomes, quality determinants might have a role shrinking program quality gaps. Findings have implications for the design and replication of high-quality SCPs, their regulation, and the development of information systems.
Keywords:	higher education, short-cycle degrees, quality
JEL:	I22 I23 I26 J24
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_10255&r=big

The Transformation of Public Policy Analysis in Times of Crisis – A Microsimulation-Nowcasting Method Using Big Data

By:	O'Donoghue, Cathal (National University of Ireland, Galway); Sologon, Denisa Maria (LISER (CEPS/INSTEAD))
Abstract:	The urgency of the two crises, especially the COVID-19 pandemic, revealed the inadequacy of traditional statistical datasets and models to provide a timely support to the decision-making process in times of volatility. Drawing upon advances in data analytics for public policy and the increasing availability of real-time data, we develop and evaluate a method for real-time policy evaluations of tax and social protection policies. Our method goes beyond the state-of-the-art by implementing an aligned or calibrated microsimulation approach to generate a counterfactual income distribution as a function of more timely external data than the underlying income survey. We evaluate the simulation performance between our approach and the transition matrix approach by undertaking a nowcast for a historical crisis, judging against an actual change and each other. Nowcasting emerges as a useful methodology for examining up-to-date statistics on labour force participation, income distribution, prices, and income inequality. We find significant differences between approaches when the calibration involves structural heterogenous changes. The model replicates the changes in income distribution over one year; over the longer term, the model is able to capture the trend, but the precision of the levels weakens the further we get from the estimation year.
Keywords:	big data, policy analysis, nowcasting, microsimulation, COVID-19
JEL:	I31 I38 C54
Date:	2023–02
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp15937&r=big

Sentiment Spin: Attacking Financial Sentiment with GPT-3

By:	Markus Leippold (University of Zurich; Swiss Finance Institute)
Abstract:	The use of dictionaries in financial sentiment analysis and other financial and economic applications remains widespread because keyword-based methods appear more transparent and explainable than more advanced techniques commonly used in computer science. However, this paper demonstrates the vulnerability of using dictionaries by exploiting the eloquence of GPT-3, a sophisticated transformer model, to generate successful adversarial attacks on keyword-based approaches with a success rate close to 99% for negative sentences in the financial phrase base, a well-known human-annotated database for financial sentiment analysis. In contrast, more advanced methods, such as those using context-aware approaches like BERT, remain robust.
Keywords:	sentiment analysis in financial markets, keyword-based approach, FinBERT, GPT-3
JEL:	G2 G38 C8 M48
Date:	2023–02
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp2311&r=big

Extraterrestrial Artificial Intelligence: The Final Existential Risk?

By:	Naudé, Wim (RWTH Aachen University)
Abstract:	The possibility that artificial extraterrestrial intelligence poses an existential threat to humanity is neglected. It is also the case in economics, where both AI existential risks and the potential long-term consequences of an AGI are neglected. This paper presents a thought experiment to address these lacunas. It is argued that it is likely that any advanced extraterrestrial civilization that we may encounter will be an AGI, and such an AGI will pose an existential risk. Two arguments are advanced for why this is the case. One draws on the Dark Forest Hypothesis and another on the Galactic Colonization Imperative. Three implications for how we govern AI and insure against potential existential risks follow. These are (i) accelerating the development of AI as a precautionary step; (ii) maintaining economic growth until we attain the wealth and technological levels to create AGI and expand into the galaxy; and (iii) putting more research and practical effort into solving the Fermi Paradox. Several areas where economists can contribute to these three implications are identified.
Keywords:	technology, artificial intelligence, existential risk, Fermi paradox, Grabby Aliens
JEL:	O40 O33 D01 D64
Date:	2023–02
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp15924&r=big

THE RELEVANT SCIENTIFIC AND METHODOLOGICAL APPROACHES TO ENSURING FUNDAMENTAL HUMAN RIGHTS IN DATA PROCESSING IN PUBLIC ADMINISTRATION

By:	Talapina Elvira V. (The Russian Presidential Academy Of National Economy And Public Administration); Yuzhakov Vladimir N. (The Russian Presidential Academy Of National Economy And Public Administration); Chereshneva Irina A. (The Russian Presidential Academy Of National Economy And Public Administration)
Abstract:	nforcing fundamental human rights is a constitutional obligation of the Russian Federation. In the digital age, the risks of human rights violations are increasing, making it increasingly relevant to implement a consistent state policy related specifically to data processing in public administration. The objective of this paper is to analyze the scientific and methodological approaches to enforcing fundamental human rights in data processing in public administration and the possibilities for consideration of human rights in the Russian public administration. The subject of the study includes scientific publications, court cases, international and national laws and regulations, including foreign countries. The study uses formal legal and historical legal methods, comparative legal method, method of legal interpretation, logical analysis, general scientific methods of classification and modeling. The results of the study are an analytical review of foreign and Russian scientific and methodological approaches to enforcing fundamental human rights in data processing in digital public administration; systematization of the basic legal grounds for enforcing fundamental human rights in data processing in public administration; the proposals for legal enforcement of fundamental human rights in data processing in the Russian public administration. The study allows drawing conclusions about the lack of attention in the Russian doctrine and practice to the issue of fundamental rights. Since Russian legislation in this area is based on the European model, the implementation of European approaches to data protection is recommended. It is necessary to create special legislation on data processing in public administration, to define the criteria for proper data processing and data storage, to differentiate the types of data collection and data processing, to ensure transparency. Based on advanced digital technologies, data processing rules in the public sector should be stricter and more transparent than in the private sector. The scientific novelty of the research is determined by insufficient regulation of data processing in public administration, where the constitutional function of enforcement and protection of fundamental human rights is not sufficiently regulated. Based on the results of the study, recommendations are to use the findings in the formation of an appropriate state policy to enforce fundamental human rights in data processing in the Russian public administration.
Keywords:	Public administration, big data, data, personal data, digitalization, human rights, data processing, artificial intelligence, algorithm
URL:	http://d.repec.org/n?u=RePEc:rnp:wpaper:w2022054&r=big

PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets

By:	Shuo Sun; Molei Qin; Xinrun Wang; Bo An
Abstract:	The financial markets, which involve more than $90 trillion in market capitalization, attract the attention of innumerable investors around the world. Recently, reinforcement learning in financial markets (FinRL) emerges as a promising direction to train agents for making profitable investment decisions. However, the evaluation of most FinRL methods only focus on profit-related measures, which are far from satisfactory for practitioners to deploy these methods into real-world financial markets. Therefore, we introduce PRUDEX-Compass, which has 6 axes, i.e., Profitability, Risk-control, Universality, Diversity, rEliability, and eXplainability, with a total of 17 measures for a systematic evaluation. Specifically, i) we propose AlphaMix+ as a strong FinRL baseline, which leverages Mixture-of-Experts (MoE) and risk-10 sensitive approaches to make diversified risk-aware investment decisions, ii) we11 evaluate 8 widely used FinRL methods in 4 long-term real-world datasets of influential financial markets to demonstrate the usage of our PRUDEX-Compass, iii) PRUDEX-Compass1 together with 4 real-world datasets, standard implementation of 8 FinRL methods and a portfolio management RL environment is released as public resources to facilitate the design and comparison of new FinRL methods. We hope that PRUDEX-Compass can shed light on future FinRL research to prevent untrustworthy results from stagnating FinRL into successful industry deployment.
Date:	2023–01
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2302.00586&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.