nep-big 2022-06-13 papers

on Big Data

Issue of 2022‒06‒13
fifteen papers chosen by
Tom Coupé
University of Canterbury

AI Adoption in a Competitive Market By Joshua S. Gans
NFT Appraisal Prediction: Utilizing Search Trends, Public Market Data, Linear Regression and Recurrent Neural Networks By Shrey Jain; Camille Bruckmann; Chase McDougall
AI Adoption in a Monopoly Market By Joshua S. Gans
Acquisition of Costly Information in Data-Driven Decision Making By Lukas Janasek
Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing By Ravi Kumar; Shahin Boluki; Karl Isler; Jonas Rauch; Darius Walczak
The Determinants of Internet User Skills in Europe By Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito O. M.; Massaro, Alessandro
Practical Skills Demand Forecasting via Representation Learning of Temporal Dynamics By Maysa M. Garcia de Macedo; Wyatt Clarke; Eli Lucherini; Tyler Baldwin; Dilermando Queiroz Neto; Rogerio de Paula; Subhro Das
Predicting Political Ideology from Digital Footprints By Michael Kitchner; Nandini Anantharama; Simon Angus; Paul A. Raschky
Predicting Political Ideology from Digital Footprints By Michael Kitchener; Nandini Anantharama; Simon D. Angus; Paul A. Raschky
Leveraging Artificial Intelligence in the Cyber Workplace: Prospects and Limitations for the Cyber Economy By agarwal, shekhar; Dutta, Madhurima; Dutta, Ritvik; Krishna, Vijesh
Hot off the press: News-implied sovereign default risk By Dim, Chukwuma; Koerner, Kevin; Wolski, Marcin; Zwart, Sanne
AI, Trade and Creative Destruction: A First Look By Daniel Trefler; Ruiqi Sun
Nowcasting world GDP growth with high‐frequency data By Caroline Jardet; Baptiste Meunier
Russia's Ruble during the onset of the Russian invasion of Ukraine in early 2022: The role of implied volatility and attention By \v{S}tefan Ly\'ocsa; Tom\'a\v{s} Pl\'ihal
The role of sentiment in the US economy: 1920 to 1934 By Kabiri, Ali; James, Harold; Landon-Lane, John; Tuckett, David; Nyman, Rickard

By:	Joshua S. Gans
Abstract:	Economists have often viewed the adoption of artificial intelligence (AI) as a standard process innovation where we expect that efficiency will drive adoption in competitive markets. This paper models AI based on recent advances in machine learning that allow firms to engage in better prediction. Using prediction of demand, it is demonstrated that AI adoption is a complement to variable inputs whose levels are directly altered by predictions and use is economised by them (that is, labour). It is shown that, in a competitive market, this increases the short-run elasticity of supply and may or may not increase average equilibrium prices. There are generically externalities in adoption with this reducing the profits of non-adoptees when variable inputs are important and increasing them otherwise. Thus, AI does not operate as a standard process innovation and its adoption may confer positive externalities on non-adopting firms. In the long-run, AI adoption is shown to generally lower prices and raise consumer surplus in competitive markets.
JEL:	D21 D41 D81 O31
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29996&r=

NFT Appraisal Prediction: Utilizing Search Trends, Public Market Data, Linear Regression and Recurrent Neural Networks

By:	Shrey Jain; Camille Bruckmann; Chase McDougall
Abstract:	In this paper we investigate the correlation between NFT valuations and various features from three primary categories: public market data, NFT metadata, and social trends data.
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2204.12932&r=

AI Adoption in a Monopoly Market

By:	Joshua S. Gans
Abstract:	The adoption of artificial intelligence (AI) prediction of demand by a monopolist firm is examined. It is shown that, in the absence of AI prediction, firms face complex trade-offs in setting price and quantity ahead of demand that impact on the returns of AI adoption. Different industrial environments with differing flexibility of prices and/or quantity ex post, also impact on AI returns as does the time horizon of AI prediction. While AI has positive benefits for firms in terms of profitability, its impact on average price and quantity, as well as consumer welfare, is more nuanced and critically dependent on environmental characteristics.
JEL:	D21 D81 O31
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29995&r=

Acquisition of Costly Information in Data-Driven Decision Making

By:	Lukas Janasek (Institute of Economic Studies, Charles University & Institute of Information Theory and Automation, Czech Academy of Sciences, Prague, Czech Republic)
Abstract:	This paper formulates and solves an economic decision problem of the acquisition of costly information in data-driven decision making. The paper assumes an agent predicting a random variable utilizing several costly explanatory variables. Prior to the decision making, the agent learns about the relationship between the random variables utilizing its past realizations. During the decision making, the agent decides what costly variables to acquire and predicts using the acquired variables. The agent´s utility consists of the correctness of the prediction and the costs of the acquired variables. To solve the decision problem, we split the decision process into two parts: acquisition of variables and prediction using the acquired variables. For the prediction, we propose an approach for training a single predictive model accepting any combination of acquired variables. For the acquisition, we propose two methods using supervised machine learning models: a backward estimation of the expected utility of each variable and a greedy acquisition of variables based on a myopic estimate of the expected utility. We evaluate the methods on two medical datasets. The results show that the methods acquire the costly variables efficiently.
Keywords:	costly information, data-driven decision-making, machine learning
JEL:	C44 C45 C52 C73 D81 D83
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:fau:wpaper:wp2022_10&r=

Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing

By:	Ravi Kumar; Shahin Boluki; Karl Isler; Jonas Rauch; Darius Walczak
Abstract:	We consider the problem of dynamic pricing of a product in the presence of feature-dependent price sensitivity. Based on the Poisson semi-parametric approach, we construct a flexible yet interpretable demand model where the price related part is parametric while the remaining (nuisance) part of the model is non-parametric and can be modeled via sophisticated ML techniques. The estimation of price-sensitivity parameters of this model via direct one-stage regression techniques may lead to biased estimates. We propose a two-stage estimation methodology which makes the estimation of the price-sensitivity parameters robust to biases in the nuisance parameters of the model. In the first-stage we construct the estimators of observed purchases and price given the feature vector using sophisticated ML estimators like deep neural networks. Utilizing the estimators from the first-stage, in the second-stage we leverage a Bayesian dynamic generalized linear model to estimate the price-sensitivity parameters. We test the performance of the proposed estimation schemes on simulated and real sales transaction data from Airline industry. Our numerical studies demonstrate that the two-stage approach provides more accurate estimates of price-sensitivity parameters as compared to direct one-stage approach.
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.01875&r=

The Determinants of Internet User Skills in Europe

By:	Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito O. M.; Massaro, Alessandro
Abstract:	The following article indicates the determinants of “Internet User Skills” among European countries based on the application of the database deriving from the DESI-Index. The data were analyzed using the following econometric models, namely: Panel Data with Fixed Effects, Panel Data with Random Effects, Pooled OLS, WLS, WLS corrected for heteroskedasticity. The Elbow method and the Silouette coefficient method were compared for the optimization of the number of clusters obtained by the k-Means algorithm. The result shows the presence of 5 clusters. A network analysis was carried out using the Euclidean distance with the result of identifying two network structures between some analyzed countries. subsequently a comparison was made between six different machine learning algorithms for the prediction of the future value of the variable of interest. The result shows that the best predictor algorithm is Gradient Boosted Tree Regression with an expected value of the predicted variable increasing by a value of 1.75%. Later a further comparison was made by comparing 6 algorithms with the increased data. The result shows that the best predictor is Simple Regression Tree. The interest variable is predicted to decrease by an amount equal to -6.099%. Statistical errors improve on average by 32.43% in the transition between the original data and the increased data.
Keywords:	Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation
JEL:	O30 O31 O32 O33 O34
Date:	2022–05–18
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:113123&r=

Practical Skills Demand Forecasting via Representation Learning of Temporal Dynamics

By:	Maysa M. Garcia de Macedo; Wyatt Clarke; Eli Lucherini; Tyler Baldwin; Dilermando Queiroz Neto; Rogerio de Paula; Subhro Das
Abstract:	Rapid technological innovation threatens to leave much of the global workforce behind. Today's economy juxtaposes white-hot demand for skilled labor against stagnant employment prospects for workers unprepared to participate in a digital economy. It is a moment of peril and opportunity for every country, with outcomes measured in long-term capital allocation and the life satisfaction of billions of workers. To meet the moment, governments and markets must find ways to quicken the rate at which the supply of skills reacts to changes in demand. More fully and quickly understanding labor market intelligence is one route. In this work, we explore the utility of time series forecasts to enhance the value of skill demand data gathered from online job advertisements. This paper presents a pipeline which makes one-shot multi-step forecasts into the future using a decade of monthly skill demand observations based on a set of recurrent neural network methods. We compare the performance of a multivariate model versus a univariate one, analyze how correlation between skills can influence multivariate model results, and present predictions of demand for a selection of skills practiced by workers in the information technology industry.
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.09508&r=

Predicting Political Ideology from Digital Footprints

By:	Michael Kitchner (SoDa Laboratories, Monash University); Nandini Anantharama (SoDa Laboratories, Monash University); Simon Angus (Department of Economics and SoDa Laboratories, Monash University); Paul A. Raschky (Department of Economics and SoDa Laboratories, Monash University)
Abstract:	This paper proposes a new method to predict individual political ideology from digital footprints on one of the world's largest online discussion forum. We compiled a unique data set from the online discussion forum reddit that contains information on the political ideology of around 91,000 users as well as records of their comment frequency and the comments' text corpus in over 190,000 different subforums of interest. Applying a set of statistical learning approaches, we show that information about activity in non-political discussion forums alone, can very accurately predict a user's political ideology. Depending on the model, we are able to predict the economic dimension of ideology with an accuracy of up to 90.63% and the social dimension with and accuracy of up to 82.02%. In comparison, using the textual features from actual comments does not improve predictive accuracy. Our paper highlights the importance of revealed digital behaviour to complement stated preferences from digital communication when analysing human preferences and behaviour using online data.
Keywords:	data mining, political ideology, digital footprint, Reddit
JEL:	D72
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:ajr:sodwps:2022-01&r=

Predicting Political Ideology from Digital Footprints

By:	Michael Kitchener (SoDa Laboratories, Monash University); Nandini Anantharama (SoDa Laboratories, Monash University); Simon D. Angus (Department of Economics and SoDa Laboratories, Monash University); Paul A. Raschky (Department of Economics and SoDa Laboratories, Monash University)
Abstract:	This paper proposes a new method to predict individual political ideology from digital footprints on one of the world's largest online discussion forum. We compiled a unique data set from the online discussion forum reddit that contains information on the political ideology of around 91,000 users as well as records of their comment frequency and the comments' text corpus in over 190,000 different subforums of interest. Applying a set of statistical learning approaches, we show that information about activity in non-political discussion forums alone, can very accurately predict a user's political ideology. Depending on the model, we are able to predict the economic dimension of ideology with an accuracy of up to 90.63\% and the social dimension with an accuracy of up to 83.09\%. In comparison, using the textual features from actual comments does not improve predictive accuracy. Our paper highlights the importance of revealed digital behaviour to complement stated preferences from digital communication when analysing human preferences and behaviour using online data.
Keywords:	data mining, political ideolog, digital footprint, Reddit
JEL:	A10
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:mos:moswps:2022-12&r=

Leveraging Artificial Intelligence in the Cyber Workplace: Prospects and Limitations for the Cyber Economy

By:	agarwal, shekhar; Dutta, Madhurima; Dutta, Ritvik; Krishna, Vijesh
Abstract:	The shift to Industry 4.0, as well as the accompanying widespread digitization and integration of AI technology into the economic system, have laid the groundwork for a major shift - one toward the formation of a cyber economy: a sort of economy in which people are economic entities who interact with or are challenged by AI. This paper investigates these relationships and assesses the overall implications of digital revolution on the profession and the future economic system. Scholars from a variety of disciplines discuss the problems and prospects of applying AI in business areas, and the role of individuals who work with digital channels. Finally, the paper explores the importance of, and choices for, teaching and educating workers in the digital era.
Date:	2022–04–04
URL:	http://d.repec.org/n?u=RePEc:osf:thesis:4yr8j&r=

Hot off the press: News-implied sovereign default risk

By:	Dim, Chukwuma; Koerner, Kevin; Wolski, Marcin; Zwart, Sanne
Abstract:	We develop a sovereign default risk index using natural language processing techniques and 10 million news articles covering over 100 countries. The index is a highfrequency measure of countries' default risk, particularly for those lacking marketbased measures: it correlates with sovereign CDS spreads, predicts rating downgrades, and reflects default risk information not fully captured by CDS spreads. We assess the influence of sovereign default concerns on equity markets and find that spikes in the index are negatively associated with same-week market returns, which reverses over the next week, indicating that investors might overreact to default concerns. Equity markets' reaction to default concerns is more pronounced and persistent for countries with tight fiscal constraints. The response to global, compared to country-specific, default concerns is much stronger, underlining the relevance of global "push" factors for local asset prices.
Keywords:	Sovereign default,Credit risk,Equity returns,Machine learning,Naturallanguage processing,Early warning indicators
JEL:	F30 G12 G15
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:zbw:eibwps:202206&r=

AI, Trade and Creative Destruction: A First Look

By:	Daniel Trefler; Ruiqi Sun
Abstract:	Artificial Intelligence is a powerful new technology that will likely have large impacts on the size, direction and composition of international trade flows. Yet almost nothing is known empirically about this. One AI-enabled set of services that can be tracked resides in the palm of our hands: the Mobile Apps used by half the world's population. To analyze the impact of AI on international trade in mobile App services we merge 2014-2020 data on international downloads of mobile Apps with data on the AI patents held by each App's parent company. From this we build a measure of AI deployment. We instrument AI deployment using cost-shifters from the theory of comparative advantage: Countries with a large stock of AI expertise will have a comparative advantage producing AI-intensive Apps. We show the following IV results. (1) Bilateral Trade: AI deployment increases App downloads by a factor of six. (2) Variety Effects: AI deployment doubles the number of exported App varieties. (3) Creative Destruction: AI deployment increases creative destruction (entry and exit of Apps) and in 2020 the net effect was an increase in welfare of between 2.5% and 10.6%.
JEL:	F1 F12 F14
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29980&r=

Nowcasting world GDP growth with high‐frequency data

By:	Caroline Jardet (Centre de recherche de la Banque de France - Banque de France); Baptiste Meunier (Centre de recherche de la Banque Centrale européenne - Banque Centrale Européenne, AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique)
Abstract:	Although the Covid-19 crisis has shown how high-frequency data can help track the economy in real time, we investigate whether it can improve the nowcasting accuracy of world GDP growth. To this end, we build a large dataset of 718 monthly and 255 weekly series. Our approach builds on a Factor-Augmented MIxed DAta Sampling (FA-MIDAS), which we extend with a preselection of variables. We find that this preselection markedly enhances performances. This approach also outperforms a LASSO-MIDAS—another technique for dimension reduction in a mixed-frequency setting. Though we find that a FA-MIDAS with weekly data outperform other models relying on monthly or quarterly data, we also point to asymmetries. Models with weekly data have indeed performances similar to other models during "normal" times but can strongly outperform them during "crisis" episodes, above all the Covid-19 period. Finally, we build a nowcasting model for world GDP annual growth incorporating weekly data that give timely (one per week) and accurate forecasts (close to IMF and OECD projections but with 1- to 3-month lead). Policy-wise, this can provide an alternative benchmark for world GDP growth during crisis episodes when sudden swings in the economy make usual benchmark projections (IMF's or OECD's) quickly outdated.
Keywords:	big data,high frequency,large factor models,mixed frequency,nowcasting,variable selection
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03647097&r=

Russia's Ruble during the onset of the Russian invasion of Ukraine in early 2022: The role of implied volatility and attention

By:	\v{S}tefan Ly\'ocsa; Tom\'a\v{s} Pl\'ihal
Abstract:	The onset of the Russo-Ukrainian crisis has led to the rapid depreciation of the Russian ruble. In this study, we model intraday price fluctuations of the USD/RUB and the EUR/RUB exchange rates from the $1^{st}$ of December 2021 to the $7^{th}$ of March 2022. Our approach is novel in that instead of using daily (low-frequency) measures of attention and investor's expectations, we use intraday (high-frequency) data: google searches and implied volatility to proxy investor's attention and expectations. We show that both approaches are useful in predicting intraday price fluctuations of the two exchange rates, although implied volatility encompasses intraday attention.
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2205.09179&r=

The role of sentiment in the US economy: 1920 to 1934

By:	Kabiri, Ali; James, Harold; Landon-Lane, John; Tuckett, David; Nyman, Rickard
Abstract:	This paper investigates the role of sentiment in the US economy from 1920 to 1934 using digitised articles from The Wall Street Journal. We derive a monthly sentiment index and use a 10-variable vector error correction model to identify sentiment shocks that are orthogonal to fundamentals. We show the timing and strength of these shocks and their resultant effects on the economy using historical decompositions. Intermittent impacts of up to 15 per cent on industrial production, 10 per cent on the S&P 500 and bank loans, and 37 basis points for the credit risk spread suggest a large role for sentiment.
Keywords:	algorithmic text analysis; business sentiment; Great Depression; US interwar economy; EP/P016847/1; ESRC-NIESR Rebuilding Macroeconomics network
JEL:	N12 N22 E32 D89
Date:	2022–04–25
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:115109&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.