nep-big 2022-06-20 papers

on Big Data

Issue of 2022‒06‒20
25 papers chosen by
Tom Coupé
University of Canterbury

Deep learning based Chinese text sentiment mining and stock market correlation research By Chenrui Zhang
Machine Learning for Economists: An Introduction By Sonan Memon
Machine learning techniques in joint default assessment By Margherita Doria; Elisa Luciano; Patrizia Semeraro
AI Watch: Revisiting Technology Readiness Levels for relevant Artificial Intelligence technologies By MARTINEZ PLUMED Fernando; CABALLERO BENÍTEZ Fernando; CASTELLANO FALCÓN David; FERNANDEZ LLORCA David; GOMEZ Emilia; HUPONT TORRES Isabelle; MERINO Luis; MONSERRAT Carlos; HERNÁNDEZ ORALLO José
Calibrating for Class Weights by Modeling Machine Learning By Andrew Caplin; Daniel Martin; Philip Marx
A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction By Yong Xie; Dakuo Wang; Pin-Yu Chen; Jinjun Xiong; Sijia Liu; Sanmi Koyejo
Machine Learning Methods: Potential for Deposit Insurance By Ryan Defina
HARNet: A convolutional neural network for realized volatility forecasting By Reisenhofer, Rafael; Bayer, Xandro; Hautsch, Nikolaus
Understanding Building Energy Efficiency with Administrative and Emerging Urban Big Data by Deep Learning in Glasgow By Sun, Maoran; Han, Changyu; Nie, Quan; Xu, Jingying; Zhang, Fan; Zhao, Qunshan
Artificial intelligence-based human–computer interaction technology applied in consumer behavior analysis and experiential education By Li, Yanmin; Zhong, Ziqi; Zhang, Fengrui; Zhao, Xinjie
Venture Capital (Mis)allocation in the Age of AI By Lyonnet, Victor; Stern, Lea H.
The Prediction of Hypertension Risk By Massaro, Alessandro; Giardinelli, Vito O. M.; Cosoli, Gabriele; Magaletti, Nicola; Leogrande, Angelo
Differentiating artificial intelligence capability clusters in Australia By Bratanova, Alexandra; Pham, Hien; Mason, Claire; Hajkowicz, Stefan; Naughtin, Claire; Schleiger, Emma; Sanderson, Conrad; Chen, Caron; Karimi, Sarvnaz
Innovative SMEs Collaborating with Others in Europe By Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio; Matarrese, Marco Maria
AI-tocracy By Martin Beraja; Andrew Kao; David Y. Yang; Noam Yuchtman
People versus machines: introducing the HIRE framework By Will, Paris; Krpan, Dario; Lordan, Grace
Automation and the changing nature of work By Josten, Cecily; Lordan, Grace
Three Families of Automated Text Analysis By van Loon, Austin
The canonical correlation complexity method By Nomaler, Önder; Verspagen, Bart
A time-varying study of Chinese investor sentiment, stock market liquidity and volatility: Based on deep learning BERT model and TVP-VAR model By Chenrui Zhang; Xinyi Wu; Hailu Deng; Huiwei Zhang
AI Watch: Estimating AI investments in the European Union By Tatjana Evas; Maikki Sipinen; Martin Ulbrich; Alessandro Dalla Benetta; Maciej Sobolewski; Daniel Nepelski
Aligned with Whom? Direct and Social Goals for AI Systems By Anton Korinek; Avital Balwit
Paying over the odds at the end of the fiscal year. Evidence from Ukraine By Margaryta Klymak; Stuart Baumann
Integration of Behavioral Economic Models to Optimize ML performance and interpretability: a sandbox example By Emilio Soria-Olivas; Jos\'e E. Vila Gisbert; Regino Barranquero Carde\~nosa; Yolanda Gomez
"Daily Growth at Risk: financial or real drivers? The answer is not always the same". By Helena Chuliá; Ignacio Garrón; Jorge M. Uribe

Deep learning based Chinese text sentiment mining and stock market correlation research

By:	Chenrui Zhang
Abstract:	We explore how to crawl financial forum data such as stock bars and combine them with deep learning models for sentiment analysis. In this paper, we will use the BERT model to train against the financial corpus and predict the SZSE Component Index, and find that applying the BERT model to the financial corpus through the maximum information coefficient comparison study. The obtained sentiment features will be able to reflect the fluctuations in the stock market and help to improve the prediction accuracy effectively. Meanwhile, this paper combines deep learning with financial text, in further exploring the mechanism of investor sentiment on stock market through deep learning method, which will be beneficial for national regulators and policy departments to develop more reasonable policy guidelines for maintaining the stability of stock market.
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.04743&r=

Machine Learning for Economists: An Introduction

By:	Sonan Memon (Pakistan Institute of Development Economics)
Abstract:	Machine Learning (henceforth ML) refers to the set of algorithms and computational methods which enable computers to learn patterns from training data without being explicitly programmed to do so.[1] ML uses training data to learn patterns by estimating a mathematical model and making predictions in out of sample based on new or unseen input data. ML has the tremendous capacity to discover complex, flexible and crucially generalisable structure in training data.
Keywords:	Machine Learning, Economists, Introduction
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:pid:kbrief:2021:33&r=

Machine learning techniques in joint default assessment

By:	Margherita Doria; Elisa Luciano; Patrizia Semeraro
Abstract:	This paper studies the consequences of capturing non-linear dependence among the covariates that drive the default of different obligors and the overall riskiness of their credit portfolio. Joint default modeling is, without loss of generality, the classical Bernoulli mixture model. Using an application to a credit card dataset we show that, even when Machine Learning techniques perform only slightly better than Logistic Regression in classifying individual defaults as a function of the covariates, they do outperform it at the portfolio level. This happens because they capture linear and non-linear dependence among the covariates, whereas Logistic Regression only captures linear dependence. The ability of Machine Learning methods to capture non-linear dependence among the covariates produces higher default correlation compared with Logistic Regression. As a consequence, on our data, Logistic Regression underestimates the riskiness of the credit portfolio.
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.01524&r=

AI Watch: Revisiting Technology Readiness Levels for relevant Artificial Intelligence technologies

By:	MARTINEZ PLUMED Fernando (European Commission - JRC); CABALLERO BENÍTEZ Fernando; CASTELLANO FALCÓN David; FERNANDEZ LLORCA David (European Commission - JRC); GOMEZ Emilia (European Commission - JRC); HUPONT TORRES Isabelle (European Commission - JRC); MERINO Luis; MONSERRAT Carlos; HERNÁNDEZ ORALLO José
Abstract:	Artificial intelligence (AI) offers the potential to transform our lives in radical ways. However, we lack the tools to determine which achievements will be attained in the near future. Also, we usually underestimate which various technologies in AI are capable of today. This report constitutes the second edition of a study proposing an example-based methodology to categorise and assess several AI technologies, by mapping them onto Technology Readiness Levels (TRL) (e.g., maturity and availability levels). We first interpret the nine TRLs in the context of AI and identify different categories in AI to which they can be assigned. We then introduce new bidimensional plots, called readiness-vs-generality charts, where we see that higher TRLs are achievable for low-generality technologies focusing on narrow or specific abilities, while high TRLs are still out of reach for more general capabilities. In an incremental way, this edition builds on the first report on the topic by updating the assessment of the original set of AI technologies and complementing it with an analysis of new AI technologies. We include numerous examples of AI technologies in a variety of fields and show their readiness-vs-generality charts, serving as a base for a broader discussion of AI technologies. Finally, we use the dynamics of several AI technologies at different generality levels and moments of time to forecast some short-term and mid-term trends for AI.
Keywords:	Artificial Intelligence, Technology Readiness Level, AI technology, evaluation, machine learning, recommender systems, expert systems, apprentice by demonstration, audio-visual content generation, machine translation, speech recognition, massive multi-modal models, facial recognition, text recognition, transport scheduling systems, self-driving cars, home cleaning robots, logistic robots, negotiation agents, virtual assistants, risks
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc129399&r=

Calibrating for Class Weights by Modeling Machine Learning

By:	Andrew Caplin; Daniel Martin; Philip Marx
Abstract:	A much studied issue is the extent to which the confidence scores provided by machine learning algorithms are calibrated to ground truth probabilities. Our starting point is that calibration is seemingly incompatible with class weighting, a technique often employed when one class is less common (class imbalance) or with the hope of achieving some external objective (cost-sensitive learning). We provide a model-based explanation for this incompatibility and use our anthropomorphic model to generate a simple method of recovering likelihoods from an algorithm that is miscalibrated due to class weighting. We validate this approach in the binary pneumonia detection task of Rajpurkar, Irvin, Zhu, et al. (2017).
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.04613&r=

A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction

By:	Yong Xie; Dakuo Wang; Pin-Yu Chen; Jinjun Xiong; Sijia Liu; Sanmi Koyejo
Abstract:	More and more investors and machine learning models rely on social media (e.g., Twitter and Reddit) to gather real-time information and sentiment to predict stock price movements. Although text-based models are known to be vulnerable to adversarial attacks, whether stock prediction models have similar vulnerability is underexplored. In this paper, we experiment with a variety of adversarial attack configurations to fool three stock prediction victim models. We address the task of adversarial generation by solving combinatorial optimization problems with semantics and budget constraints. Our results show that the proposed attack method can achieve consistent success rates and cause significant monetary loss in trading simulation by simply concatenating a perturbed but semantically similar tweet.
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.01094&r=

Machine Learning Methods: Potential for Deposit Insurance

By:	Ryan Defina (International Association of Deposit Insurers)
Abstract:	The field of deposit insurance is yet to realise fully the potential of machine learning, and the substantial benefits that it may present to its operational and policy-oriented activities. There are practical opportunities available (some specified in this paper) that can assist in improving deposit insurersâ€™ relationship with the technology. Sharing of experiences and learnings via international engagement and collaboration is fundamental in developing global best practices in this space.
Keywords:	deposit insurance, bank resolution
JEL:	G21 G33
Date:	2021–09
URL:	http://d.repec.org/n?u=RePEc:awl:finbri:3&r=

HARNet: A convolutional neural network for realized volatility forecasting

By:	Reisenhofer, Rafael; Bayer, Xandro; Hautsch, Nikolaus
Abstract:	Despite the impressive success of deep neural networks in many application areas, neural network models have so far not been widely adopted in the context of volatility forecasting. In this work, we aim to bridge the conceptual gap between established time series approaches, such as the Heterogeneous Autoregressive (HAR) model (Corsi, 2009), and state-of-the-art deep neural network models. The newly introduced HARNet is based on a hierarchy of dilated convolutional layers, which facilitates an exponential growth of the receptive field of the model in the number of model parameters. HARNets allow for an explicit initialization scheme such that before optimization, a HARNet yields identical predictions as the respective baseline HAR model. Particularly when considering the QLIKE error as a loss function, we find that this approach significantly stabilizes the optimization of HARNets. We evaluate the performance of HARNets with respect to three different stock market indexes. Based on this evaluation, we formulate clear guidelines for the optimization of HARNets and show that HARNets can substantially improve upon the forecasting accuracy of their respective HAR baseline models. In a qualitative analysis of the filter weights learnt by a HARNet, we report clear patterns regarding the predictive power of past information. Among information from the previous week, yesterday and the day before, yesterday's volatility makes by far the most contribution to today's realized volatility forecast. Moroever, within the previous month, the importance of single weeks diminishes almost linearly when moving further into the past.
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:zbw:cfswop:680&r=

Understanding Building Energy Efficiency with Administrative and Emerging Urban Big Data by Deep Learning in Glasgow

By:	Sun, Maoran; Han, Changyu; Nie, Quan; Xu, Jingying; Zhang, Fan; Zhao, Qunshan
Abstract:	With buildings consuming nearly 40% of energy in developed countries, it is important to accurately estimate and understand the building energy efficiency in a city. In this research, we propose a deep learning-based multi-source data fusion framework to estimate building energy efficiency. We consider the traditional factors associated with the building energy efficiency from the energy performance certificate for 160,000 properties (30,000 buildings) in Glasgow, UK (e.g., property structural attributes and morphological attributes), as well as the Google Street View (GSV) building façade images as a complement. We compare the performance improvements between our data-fusion framework with traditional morphological attributes and image-only models. The results show that including the building façade images from GSV, the overall model accuracy increases from 79.7% to 86.8%. A further investigation and explanation of the deep learning model are conducted to understand the relationships between building features and building energy efficiency by using Shapley Additive explanations (SHAP). Our research demonstrates the potential of using multi-source data in building energy efficiency prediction to help understand building energy efficiency at the city level to help achieve the net-zero target by 2050.
Date:	2022–05–04
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:g8p4f&r=

Artificial intelligence-based human–computer interaction technology applied in consumer behavior analysis and experiential education

By:	Li, Yanmin; Zhong, Ziqi; Zhang, Fengrui; Zhao, Xinjie
Abstract:	In the course of consumer behavior, it is necessary to study the relationship between the characteristics of psychological activities and the laws of behavior when consumers acquire and use products or services. With the development of the Internet and mobile terminals, electronic commerce (E-commerce) has become an important form of consumption for people. In order to conduct experiential education in E-commerce combined with consumer behavior, courses to understand consumer satisfaction. From the perspective of E-commerce companies, this study proposes to use artificial intelligence (AI) image recognition technology to recognize and analyze consumer facial expressions. First, it analyzes the way of human–computer interaction (HCI) in the context of E-commerce and obtains consumer satisfaction with the product through HCI technology. Then, a deep neural network (DNN) is used to predict the psychological behavior and consumer psychology of consumers to realize personalized product recommendations. In the course education of consumer behavior, it helps to understand consumer satisfaction and make a reasonable design. The experimental results show that consumers are highly satisfied with the products recommended by the system, and the degree of sanctification reaches 93.2%. It is found that the DNN model can learn consumer behavior rules during evaluation, and its prediction effect is increased by 10% compared with the traditional model, which confirms the effectiveness of the recommendation system under the DNN model. This study provides a reference for consumer psychological behavior analysis based on HCI in the context of AI, which is of great significance to help understand consumer satisfaction in consumer behavior education in the context of E-commerce.
Keywords:	behavior analysis; customer psychology; deep neutral network; human-computer interaction; image recognition
JEL:	L81
Date:	2022–04–06
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:115047&r=

Venture Capital (Mis)allocation in the Age of AI

By:	Lyonnet, Victor (Ohio State University); Stern, Lea H. (University of Washington)
Abstract:	We use machine learning to study how venture capitalists (VCs) make investment decisions. Using a large administrative data set on French entrepreneurs that contains VC-backed as well as non-VC-backed firms, we use algorithmic predictions of new venturesâ€™ performance to identify the most promising ventures. We find that VCs invest in some firms that perform predictably poorly and pass on others that perform predictably well. Consistent with models of stereotypical thinking, we show that VCs select entrepreneurs whose characteristics are representative of the most successful entrepreneurs (i.e., characteristics that occur more frequently among the best performing entrepreneurs relative to the other ones). Although VCs rely on accurate stereotypes, they make prediction errors as they exaggerate some representative features of success in their selection of entrepreneurs (e.g., male, highly educated, Paris-based, and high-tech entrepreneurs). Overall, algorithmic decision aids show promise to broaden the scope of VCsâ€™ investments and founder diversity.
JEL:	D8 D83 G11 G24 G41 M13
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:ecl:ohidic:2022-02&r=

The Prediction of Hypertension Risk

By:	Massaro, Alessandro; Giardinelli, Vito O. M.; Cosoli, Gabriele; Magaletti, Nicola; Leogrande, Angelo
Abstract:	This article presents an estimation of the hypertension risk based on a dataset on 1007 individuals. The application of a Tobit Model shows that “Hypertension” is positively associated to “Age”, “BMI-Body Mass Index”, and “Heart Rate”. The data show that the element that has the greatest impact in determining inflation risk is “BMI-Body Mass Index”. An analysis was then carried out using the fuzzy c-Means algorithm optimized with the use of the Silhouette coefficient. The result shows that the optimal number of clusters is 9. A comparison was then made between eight different machine-learning algorithms for predicting the value of the Hypertension Risk. The best performing algorithm is the Gradient Boosted Trees Regression according to the analyzed dataset. The results show that there are 37 individuals who have a predicted hypertension value greater than 0.75, 35 individuals who have a predicted hypertension value between 0.5 and 0.75, while 227 individuals have a hypertension value between 0.0 and 0.5 units.
Keywords:	Predictions, Machine Learning Algorithms, Correlation Matrix, Tobit Model, Fuzzy c-Means Clustering.
JEL:	C00 C01 C02 C50 C80
Date:	2022–05–30
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:113242&r=

Differentiating artificial intelligence capability clusters in Australia

By:	Bratanova, Alexandra; Pham, Hien; Mason, Claire; Hajkowicz, Stefan; Naughtin, Claire; Schleiger, Emma; Sanderson, Conrad; Chen, Caron; Karimi, Sarvnaz
Abstract:	We demonstrate how cluster analysis underpinned by analysis of revealed technology advantage can be used to differentiate geographic regions with comparative advantage in artificial intelligence (AI). Our analysis uses novel datasets on Australian AI businesses, intellectual property patents and labour markets to explore location, concentration and intensity of AI activities across 333 geographical regions. We find that Australia's AI business and innovation activity is clustered in geographic locations with higher investment in research and development. Through cluster analysis we identify three tiers of AI capability regions that are developing across the economy: ‘AI hotspots’ (10 regions), ‘Emerging AI regions’ (85 regions) and ‘Nascent AI regions’ (238 regions). While the AI hotspots are mainly concentrated in central business district locations, there are examples when they also appear outside CBD in areas where there has been significant investment in innovation and technology hubs. Policy makers can use the results of this study to facilitate and monitor the growth of AI capability to boost economic recovery. Investors may find these results helpful to learn about the current landscape of AI business and innovation activities in Australia.
Keywords:	Artificial intelligence, cluster, revealed technology advantage, regional innovation, Australia
JEL:	O31 O33 O38 R12
Date:	2022–05–31
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:113237&r=

Innovative SMEs Collaborating with Others in Europe

By:	Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio; Matarrese, Marco Maria
Abstract:	The following article investigates the determinants that lead innovative SMEs to collaborate. Data from 36 European countries is analyzed using Panel Data with Fixed Effects, Panel Data with Random Effects, Pooled OLS, WLS and Dynamic Panel models. The analysis shows that the ability of innovative SMEs to collaborate is positively associated with the following variables: "Linkages", "Share High and Medium high-tech manufacturing", "Finance and Support", "Broadband Penetration", "Non-R&D Innovation Expenditure" and negatively to the following variables: "New Doctorate graduates", "Venture Capital", "Foreign Controlled Enterprises Share of Value Added", "Public-Private Co-Publications", "Population Size", "Private co-funding of Public R&D expenditures". A clustering with k-Means algorithm optimized by the Silhouette coefficient was then performed and four clusters were found. A network analysis was then carried out and the result shows the presence of three composite structures of links between some European countries. Furthermore, a comparison was made between eight different predictive machine learning algorithms and the result shows that the Random Forest Regression algorithm performs better and predicts a reduction in the ability of innovative SMEs to collaborate equal to an average of 4.4%. Later a further comparison is made with augmented data. The results confirm that the best predictive algorithm is Random Forest Regression, the statistical errors of the prediction decrease on average by 73.5%, and the ability of innovative SMEs to collaborate is predicted to growth by 9.2%.
Keywords:	Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation
JEL:	O30 O31 O32 O33 O34
Date:	2022–05–09
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:113008&r=

AI-tocracy

By:	Martin Beraja; Andrew Kao; David Y. Yang; Noam Yuchtman
Abstract:	Can frontier innovation be sustained under autocracy? We argue that innovation and autocracy can be mutually reinforcing when: (i) the new technology bolsters the autocrat's power; and (ii) the autocrat's demand for the technology stimulates further innovation in applications beyond those benefiting it directly. We test for such a mutually reinforcing relationship in the context of facial recognition AI in China. To do so, we gather comprehensive data on AI firms and government procurement contracts, as well as on social unrest across China during the last decade. We first show that autocrats benefit from AI: local unrest leads to greater government procurement of facial recognition AI, and increased AI procurement suppresses subsequent unrest. We then show that AI innovation benefits from autocrats' suppression of unrest: the contracted AI firms innovate more both for the government and commercial markets. Taken together, these results suggest the possibility of sustained AI innovation under the Chinese regime: AI innovation entrenches the regime, and the regime's investment in AI for political control stimulates further frontier innovation.
Keywords:	artificial intelligence, autocracy, innovation, data, China, surveillance, political unrest
Date:	2021–11–02
URL:	http://d.repec.org/n?u=RePEc:cep:cepdps:dp1811&r=

People versus machines: introducing the HIRE framework

By:	Will, Paris; Krpan, Dario; Lordan, Grace
Abstract:	The use of Artificial Intelligence (AI) in the recruitment process is becoming a more common method for organisations to hire new employees. Despite this, there is little consensus on whether AI should have widespread use in the hiring process, and in which contexts. In order to bring more clarity to research findings, we propose the HIRE (Human, (Artificial) Intelligence, Recruitment, Evaluation) framework with the primary aim of evaluating studies which investigate how Artificial Intelligence can be integrated into the recruitment process with respect to gauging whether AI is an adequate, better, or worse substitute for human recruiters. We illustrate the simplicity of this framework by conducting a systematic literature review on the empirical studies assessing AI in the recruitment process, with 22 final papers included. The review shows that AI is equal to or better than human recruiters when it comes to efficiency and performance. We also find that AI is mostly better than humans in improving diversity. Finally, we demonstrate that there is a perception among candidates and recruiters that AI is worse than humans. Overall, we conclude based on the evidence, that AI is equal to or better to humans when utilised in the hiring process, however, humans hold a belief of their own superiority. Our aim is that future authors adopt the HIRE framework when conducting research in this area to allow for easier comparability, and ideally place the HIRE framework outcome of AI being better, equal, worse, or unclear in the abstract.
Keywords:	artificial intelligence; recruitment; hiring; diversity; This work was funded by The Inclusion Initiative at the London School of Economics and Political Science.; Springer deal
JEL:	J50
Date:	2022–05–06
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:115006&r=

Automation and the changing nature of work

By:	Josten, Cecily; Lordan, Grace
Abstract:	This study identifies the job attributes, and in particular skills and abilities, which predict the likelihood a job is recently automatable drawing on the Josten and Lordan (2020) classification of automatability, EU labour force survey data and a machine learning regression approach. We find that skills and abilities which relate to non-linear abstract thinking are those that are the safest from automation. We also find that jobs that require 'people' engagement interacted with 'brains' are also less likely to be automated. The skills that are required for these jobs include soft skills. Finally, we find that jobs that require physically making objects or physicality more generally are most likely to be automated unless they involve interaction with 'brains' and/or 'people'.
JEL:	J50
Date:	2022–05–05
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:115117&r=

Three Families of Automated Text Analysis

By:	van Loon, Austin
Abstract:	Since the beginning of this millennium, data in the form of human-generated text in a machine-readable format has become increasingly available to social scientists, presenting a unique window into social life. However, harnessing vast quantities of this highly unstructured data in a systematic way presents a unique combination of analytical and methodological challenges. Luckily, our understanding of how to overcome these challenges has also developed greatly over this same period. In this article, I present a novel typology of the methods social scientists have used to analyze text data at scale in the interest of testing and developing social theory. I describe three “families” of methods: analyses of (1) term frequency, (2) document structure, and (3) semantic similarity. For each family of methods, I discuss their logical and statistical foundations, analytical strengths and weaknesses, as well as prominent variants and applications.
Date:	2022–05–07
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:htnej&r=

The canonical correlation complexity method

By:	Nomaler, Önder (UNU-MERIT, Maastricht University); Verspagen, Bart (UNU-MERIT, Maastricht University)
Abstract:	A relatively recent, yet rapidly proliferating strand of literature in the so-called econophysics domain, known as 'economic complexity' , introduces a toolkit to analyse the relationship between specialization, diversification, and economic development. Different methods that aim at reducing the high dimensionality in data on the empirical patterns of co-location (be it nations or regions) of specializations have been proposed. In terms of the concepts of machine learning, the existing algorithms follow the framework of 'unsupervised learning'. The competing alternatives (e.g., Hidalgo and Hausmann, 2009 vs. Tacchella et al, 2012) have been based on very different assessments of which products depend on more complex capabilities, and accordingly yield highly different estimations of complexity at the product level. The approach that we developed avoids this algorithmic 'confusion' by drawing on a toolkit of more transparent and long-established methods that follow the 'supervised learning' principle where the data on trade/specialization and development are processed together from the very beginning in order to identify the patterns of mutual association. The first pillar of the toolkit, Principal Component Analysis (PCA), serves dimensionality reduction in co-location information. The second pillar, Canonical Correlation Analysis (CCA), identifies the mutual-association between the various patterns of (co-)specialization and more-than-one dimension of economic development. This way, we are able to identify the products or technologies that can be associated with the level or the growth rate of per capita GDP and CO2 emissions.
Keywords:	Economic complexity, economic development, supervised learning, canonical correlation analysis, principal component analysis
JEL:	F14 F63 O11
Date:	2022–04–22
URL:	http://d.repec.org/n?u=RePEc:unm:unumer:2022015&r=

A time-varying study of Chinese investor sentiment, stock market liquidity and volatility: Based on deep learning BERT model and TVP-VAR model

By:	Chenrui Zhang; Xinyi Wu; Hailu Deng; Huiwei Zhang
Abstract:	Based on the commentary data of the Shenzhen Stock Index bar on the EastMoney website from January 1, 2018 to December 31, 2019. This paper extracts the embedded investor sentiment by using a deep learning BERT model and investigates the time-varying linkage between investment sentiment, stock market liquidity and volatility using a TVP-VAR model. The results show that the impact of investor sentiment on stock market liquidity and volatility is stronger. Although the inverse effect is relatively small, it is more pronounced with the state of the stock market. In all cases, the response is more pronounced in the short term than in the medium to long term, and the impact is asymmetric, with shocks stronger when the market is in a downward spiral.
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.05719&r=

AI Watch: Estimating AI investments in the European Union

By:	Tatjana Evas (European Commission – DG CNECT); Maikki Sipinen (European Commission – DG CNECT); Martin Ulbrich (European Commission – DG CNECT); Alessandro Dalla Benetta (European Commission - JRC); Maciej Sobolewski (European Commission - JRC); Daniel Nepelski (European Commission - JRC)
Abstract:	This report provides estimates of AI investments in the EU between 2018 and 2020 and, for selected investments categories, in the UK and the US. It considers AI as a general-purpose technology and, besides direct investments in the development and adoption of AI technologies, also includes investments in complementary assets and capabilities such as skills, data, product design and organisational capital among AI investments. According to current estimates, in 2020 the EU invested EUR 12.7-16 billion in AI. In 2020, due to the COVID19 outbreak, the EU AI investments grew by 20-28%, compared to a growth of 43-51% in 2019.
Keywords:	General Purpose Technology, Artificial Intelligence, digital technologies, investments, intangibles, Europe
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc129174&r=

Aligned with Whom? Direct and Social Goals for AI Systems

By:	Anton Korinek; Avital Balwit
Abstract:	As artificial intelligence (AI) becomes more powerful and widespread, the AI alignment problem—how to ensure that AI systems pursue the goals that we want them to pursue—has garnered growing attention. This article distinguishes two types of alignment problems depending on whose goals we consider, and analyzes the different solutions necessitated by each. The direct alignment problem considers whether an AI system accomplishes the goals of the entity operating it. In contrast, the social alignment problem considers the effects of an AI system on larger groups or on society more broadly. In particular, it also considers whether the system imposes externalities on others. Whereas solutions to the direct alignment problem center around more robust implementation, social alignment problems typically arise because of conflicts between individual and group-level goals, elevating the importance of AI governance to mediate such conflicts. Addressing the social alignment problem requires both enforcing existing norms on their developers and operators and designing new norms that apply directly to AI systems.
JEL:	D6 O3
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:30017&r=

Paying over the odds at the end of the fiscal year. Evidence from Ukraine

By:	Margaryta Klymak; Stuart Baumann
Abstract:	Governments are the largest buyers in most countries and they tend to operate budgets that expire at the end of the fiscal year. They also tend to spend disproportionately large amounts right at year-end. This use-it-or-lose-it spending pattern has been observed in a number of countries and is considered a problem due to possible waste. This could be the case if firms increase their prices to profit from a government’s greater demand at the end of the fiscal year. We investigate this previously unexplored possibility using a novel granular dataset of Ukrainian government procurement auctions over the period between 2017 and 2021. First, we document that the prices bid by firms are significantly higher in the last month of a fiscal year. Second, we employ a neural network technique to infer supplier costs from bidding behaviour. We estimate that suppliers charge around a 7.5% higher margin on less competitive tenders at the end of a fiscal year. Third, we demonstrate how results change depending on the type of the procured good, the length of the buyer-supplier relationship, and whether the procurement was expedited as a result of the Covid-19 pandemic. Our findings imply that substantial government funds could be saved if the extent of the year end spending could be moderated.
Date:	2022–04–08
URL:	http://d.repec.org/n?u=RePEc:oxf:wpaper:968&r=

Integration of Behavioral Economic Models to Optimize ML performance and interpretability: a sandbox example

By:	Emilio Soria-Olivas; Jos\'e E. Vila Gisbert; Regino Barranquero Carde\~nosa; Yolanda Gomez
Abstract:	This paper presents a sandbox example of how the integration of models borrowed from Behavioral Economic (specifically Protection-Motivation Theory) into ML algorithms (specifically Bayesian Networks) can improve the performance and interpretability of ML algorithms when applied to Behavioral Data. The integration of Behavioral Economics knowledge to define the architecture of the Bayesian Network increases the accuracy of the predictions in 11 percentage points. Moreover, it simplifies the training process, making unnecessary training computational efforts to identify the optimal structure of the Bayesian Network. Finally, it improves the explicability of the algorithm, avoiding illogical relations among variables that are not supported by previous behavioral cybersecurity literature. Although preliminary and limited to 0ne simple model trained with a small dataset, our results suggest that the integration of behavioral economics and complex ML models may open a promising strategy to improve the predictive power, training costs and explicability of complex ML models. This integration will contribute to solve the scientific issue of ML exhaustion problem and to create a new ML technology with relevant scientific, technological and market implications.
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.01387&r=

"Daily Growth at Risk: financial or real drivers? The answer is not always the same".

By:	Helena Chuliá (Riskcenter, Institut de Recerca en Economia Aplicada (IREA), Departament d’Econometria, Estadística i Economia Aplicada, Universitat de Barcelona (UB).); Ignacio Garrón (Departament d’Econometria, Estadística i Economia Aplicada, Universitat de Barcelona (UB).); Jorge M. Uribe (Faculty of Economics and Business Studies, Open University of Catalonia.)
Abstract:	We estimate Growth-at-Risk (GaR) statistics for the US economy using daily regressors. We show that the relative importance, in terms of forecasting power, of financial and real variables is time varying. Indeed, the optimal forecasting weights of these types of variables were clearly different during the Global Financial Crisis and the recent Covid-19 crisis, which reflects the dissimilar nature of the two crises. We introduce the LASSO and the Elastic Net into the family of mixed data sampling models used to estimate GaR and show that these methods outperform past candidates explored in the literature. The role of the VXO and ADS indicators was found to be very relevant, especially in out-of-sample exercises and during crisis episodes. Overall, our results show that daily information for both real and financial variables is key for producing accurate point and tail risk nowcasts and forecasts of economic activity.
Keywords:	Vulnerable growth, Quantiles, Machine learning, Forecasting, Value at risk. JEL classification: E27, E44, E66.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:ira:wpaper:202208&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.