nep-big 2021-10-25 papers

on Big Data

Issue of 2021‒10‒25
seventeen papers chosen by
Tom Coupé
University of Canterbury

Algorithms in the Marketplace: An Empirical Analysis of Automated Pricing in E-Commerce By Marcel Wieting; Geza Sapi
The ECB's tracker: nowcasting the press conferences of the ECB By Marozzi, Armando
How Parental Educational Investments Respond to Changes in Ability Belief: An Application of Big Data Techniques in Education By Gan, Tianqi
How Robust are Limit Order Book Representations under Data Perturbation? By Yufei Wu; Mahmoud Mahfouz; Daniele Magazzeni; Manuela Veloso
Estimating returns to special education: combining machine learning and text analysis to address confounding By Aur\'elien Sallin
Using Machine Learning to Predict Consumers’ Environmental Attitudes and Beliefs By Yektansani, Kiana; Azizi, SeyedSoroosh
The content and structure of reputation domains across human societies: a view from the evolutionary social sciences By Zachary Garfield; Ryan Schacht; Emily Post; Dominique Ingram; Andrea Uehling; Shane Macfarlan
Estimating returns to special education: combining machine learning and text analysis to address confounding By Sallin, Aurelién
Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks By Curtis Nybo
Business News and Business Cycles By Leland Bybee; Bryan T. Kelly; Asaf Manela; Dacheng Xiu
Extracting Firms' Short-Term Inflation Expectations from the Economy Watchers Survey Using Text Analysis By Jouchi Nakajima; Hiroaki Yamagata; Tatsushi Okuda; Shinnosuke Katsuki; Takeshi Shinohara
Illuminating the Effects of the US-China Tariff War on China's Economy By Davin Chor; Bingjing Li
Spatial regression graph convolutional neural networks: A deep learning paradigm for spatial multivariate distributions By Zhu, Di; Liu, Yu; Yao, Xin; Fischer, Manfred M.
Reputational Assets and Social Media Marketing Activeness: Empirical Insights from China By Johansson, Anders C.; Zhu, Zhen
Impact of public news sentiment on stock market index return and volatility By Anese, Gianluca; Corazza, Marco; Costola, Michele; Pelizzon, Loriana
Machine learning in energy forecasts with an application to high frequency electricity consumption data By Erik Heilmann; Janosch Henze; Heike Wetzel
Análisis de narrativas en línea sobre el empoderamiento de las mujeres sobrevivientes de las violencias basadas en género en Colombia con procesamiento de lenguaje natural By Susana Martínez-Restrepo; Lina Tafur; Juan G. Ocio; Caroline Brethenoux; Patrick Furey; Orlando Rivera

Algorithms in the Marketplace: An Empirical Analysis of Automated Pricing in E-Commerce

By:	Marcel Wieting (KU Leuven, Department of Management, Strategy and Innovation (MSI), Naamsestraat 69, 3000 Leuven, Belgium); Geza Sapi (Düsseldorf Institute for Competition Economics, Heinrich Heine University of Düsseldorf, Universitätsstraße 1, 40225 Düsseldorf, Deutschland)
Abstract:	We analyze algorithmic pricing on Bol.com, the largest online marketplace in the Netherlands and Belgium. Based on more than two months of pricing data for around 2,800 popular products, we find that algorithmic sellers can both increase and reduce the price of the Buy Box (the most prominently displayed offer for a product). Consistently with collusion, algorithms benefit from each other's presence: Prices are particularly high if two algorithms bid against each other and there is a medium number of sellers in the market. We identify several algorithmic pricing patterns that are often associated with collusion. Algorithmic sellers are more likely to win the Buy Box, implying that consumers may face inflated prices more often. We also document efficiencies due to algorithmic pricing. With a sufficient number of competitors, algorithmic sellers reduce the Buy Box price and compete particularly fiercely. Algorithms furthermore reduce prices in monopoly markets. We explain this by the inability of traditional product managers to manually adjust prices product-by-product for a large number of items, which automated agents may correct. Overall, our findings call for careful policy with respect to pricing algorithms, that considers both the risk of collusion and the need to preserve potential efficiencies.
Keywords:	Algorithmic pricing; Artificial intelligence; Collusion; Forensic economics
JEL:	D42 D82 L42
Date:	2021–09
URL:	http://d.repec.org/n?u=RePEc:net:wpaper:2106&r=

The ECB's tracker: nowcasting the press conferences of the ECB

By:	Marozzi, Armando
Abstract:	This paper proposes an econometric framework for nowcasting the monetary policy stance and decisions of the European Central Bank (ECB) exploiting the ow of conventional and textual data that become available between two consecutive press conferences. Decompositions of the updated nowcasts into variables' marginal contribution are also provided to shed light on the main drivers of the ECB's reaction function at every point in time. In out-of-sample nowcasting experiments, the model provides an accurate tracking of the ECB monetary policy stance and decisions. The inclusion of textual variables contributes significantly to the gradual improvement of the model performance. JEL Classification: E37, E47, E52
Keywords:	dynamic factor model, forecasting, monetary policy, natural language processing
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:ecb:ecbwps:20212609&r=

How Parental Educational Investments Respond to Changes in Ability Belief: An Application of Big Data Techniques in Education

By: Gan, Tianqi

Keywords: Research Methods/Statistical Methods, Consumer/Household Economics, International Development

Date: 2021–08

URL: http://d.repec.org/n?u=RePEc:ags:aaea21:313988&r=

How Robust are Limit Order Book Representations under Data Perturbation?

By:	Yufei Wu; Mahmoud Mahfouz; Daniele Magazzeni; Manuela Veloso
Abstract:	The success of machine learning models in the financial domain is highly reliant on the quality of the data representation. In this paper, we focus on the representation of limit order book data and discuss the opportunities and challenges for learning representations of such data. We also experimentally analyse the issues associated with existing representations and present a guideline for future research in this area.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.04752&r=

Estimating returns to special education: combining machine learning and text analysis to address confounding

By:	Aur\'elien Sallin
Abstract:	While the number of students with identified special needs is increasing in developed countries, there is little evidence on academic outcomes and labor market integration returns to special education. I present results from the first ever study to examine short- and long-term returns to special education programs using recent methods in causal machine learning and computational text analysis. I find that special education programs in inclusive settings have positive returns on academic performance in math and language as well as on employment and wages. Moreover, I uncover a positive effect of inclusive special education programs in comparison to segregated programs. However, I find that segregation has benefits for some students: students with emotional or behavioral problems, and nonnative students. Finally, using shallow decision trees, I deliver optimal placement rules that increase overall returns for students with special needs and lower special education costs. These placement rules would reallocate most students with special needs from segregation to inclusion, which reinforces the conclusion that inclusion is beneficial to students with special needs.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.08807&r=

Using Machine Learning to Predict Consumers’ Environmental Attitudes and Beliefs

By:	Yektansani, Kiana; Azizi, SeyedSoroosh
Keywords:	Environmental Economics and Policy, Research Methods/Statistical Methods, Resource/Energy Economics and Policy
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:ags:aaea21:313902&r=

The content and structure of reputation domains across human societies: a view from the evolutionary social sciences

By:	Zachary Garfield (IAST - Institute for Advanced Study in Toulouse); Ryan Schacht; Emily Post; Dominique Ingram; Andrea Uehling; Shane Macfarlan
Abstract:	Reputations are an essential feature of human sociality and the evolution of cooperation and group living. Much scholarship has focused on reputations, yet typically on a narrow range of domains (e.g. prosociality and aggressiveness), usually in isolation. Humans can develop reputations, however, from any collective information. We conducted exploratory analyses on the content, distribution and structure of reputation domain diversity across cultures, using the Human Relations Area Files ethnographic database. After coding ethnographic texts on reputations from 153 cultures, we used hierarchical modelling, cluster analysis and text analysis to provide an empirical view of reputation domains across societies. Findings suggest: (i) reputational domains vary cross-culturally, yet reputations for cultural conformity, prosociality, social status and neural capital are widespread; (ii) reputation domains are more variable for males than females; and (iii) particular reputation domains are interrelated, demonstrating a structure consistent with dimensions of human uniqueness. We label these features: cultural group unity, dominance, neural capital, sexuality, social and material success and supernatural healing. We highlight the need for future research on the evolution of cooperation and human sociality to consider a wider range of reputation domains, as well as their social, ecological and gender-specific variability.
Date:	2021–10–04
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03368986&r=

Estimating returns to special education: combining machine learning and text analysis to address confounding

By:	Sallin, Aurelién
Abstract:	While the number of students with identified special needs is increasing in developed countries, there is little evidence on academic outcomes and labor market integration returns to special education. I present results from the first ever study to examine short- and longterm returns to special education programs using recent methods in causal machine learning and computational text analysis. I find that special education programs in inclusive settings have positive returns on academic performance in math and language as well as on employment and wages. Moreover, I uncover a positive effect of inclusive special education programs in comparison to segregated programs. However, I find that segregation has benefits for some students: students with emotional or behavioral problems, and nonnative students. Finally, using shallow decision trees, I deliver optimal placement rules that increase overall returns for students with special needs and lower special education costs. These placement rules would reallocate most students with special needs from segregation to inclusion, which reinforces the conclusion that inclusion is beneficial to students with special needs.
Keywords:	returns to education, special education, inclusion, segregation, causal machine learning, computational text analysis
JEL:	H52 I21 I26 J14 C31 Z13
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:usg:econwp:2021:09&r=

Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks

By:	Curtis Nybo
Abstract:	Recently artificial neural networks (ANNs) have seen success in volatility prediction, but the literature is divided on where an ANN should be used rather than the common GARCH model. The purpose of this study is to compare the volatility prediction performance of ANN and GARCH models when applied to stocks with low, medium, and high volatility profiles. This approach intends to identify which model should be used for each case. The volatility profiles comprise of five sectors that cover all stocks in the U.S stock market from 2005 to 2020. Three GARCH specifications and three ANN architectures are examined for each sector, where the most adequate model is chosen to move on to forecasting. The results indicate that the ANN model should be used for predicting volatility of assets with low volatility profiles, and GARCH models should be used when predicting volatility of medium and high volatility assets.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.09489&r=

Business News and Business Cycles

By:	Leland Bybee; Bryan T. Kelly; Asaf Manela; Dacheng Xiu
Abstract:	We propose an approach to measuring the state of the economy via textual analysis of business news. From the full text of 800,000 Wall Street Journal articles for 1984–2017, we estimate a topic model that summarizes business news into interpretable topical themes and quantifies the proportion of news attention allocated to each theme over time. News attention closely tracks a wide range of economic activities and explains 25% of aggregate stock market returns. A text-augmented VAR demonstrates the large incremental role of news text in modeling macroeconomic dynamics. We use this model to retrieve the narratives that underlie business cycle fluctuations.
JEL:	E32 G0
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29344&r=

Extracting Firms' Short-Term Inflation Expectations from the Economy Watchers Survey Using Text Analysis

By:	Jouchi Nakajima (Bank of Japan); Hiroaki Yamagata (Bank of Japan); Tatsushi Okuda (Bank of Japan); Shinnosuke Katsuki (Bank of Japan); Takeshi Shinohara (Bank of Japan)
Abstract:	This paper discusses the Price Sentiment Index (PSI), a quantitative indicator of firms' outlook for general prices proposed by Otaka and Kan (2018). The PSI is developed from the textual data of the Economy Watchers Survey conducted by the Cabinet Office; it is computed by extracting firms' views from survey comments, using text analysis. In this paper, we revisit the PSI and quantitatively analyze the determinants of changes in the PSI and the relationship between the PSI and macroeconomic variables. We also address a shortcoming in the text analysis used for computing the PSI that we discover when examining the performance of the PSI since the COVID-19 outbreak. The results of our analyses show that the PSI tends to precede consumer prices by several months and that it reflects various factors affecting price developments, including demand factors associated with the business cycle and cost factors such as changes in raw materials prices and exchange rates. Our analysis suggests that the PSI is a useful monthly indicator of inflation expectations, in that it captures the price-setting stance of firms responding to the Economy Watchers Survey. While the PSI is subject to large short-term fluctuations, it can be used to complement other indicators used for the analysis of price developments such as the output gap, existing indicators of inflation expectations, and anecdotal information from various sources.
Keywords:	Inflation Expectations; Machine Learning; Text Analysis; Big Data
JEL:	C53 C55 E31 E37
Date:	2021–10–15
URL:	http://d.repec.org/n?u=RePEc:boj:bojwps:wp21e12&r=

Illuminating the Effects of the US-China Tariff War on China's Economy

By:	Davin Chor; Bingjing Li
Abstract:	How much has the US-China tariff war impacted economic outcomes in China? We address this question using high-frequency night lights data, together with measures of the trade exposure of fine grid locations constructed from Chinese firms' geo-coordinates. Exploiting within-grid variation over time and controlling extensively for grid-specific contemporaneous trends, we find that each 1 percentage point increase in exposure to the US tariffs was associated with a 0.59% reduction in night-time luminosity. We combine these with structural elasticities that relate night lights to economic outcomes, motivated by the statistical framework of Henderson et al. (2012). The negative impact of the tariff war was highly skewed across locations: While grids with negligible direct exposure to the US tariffs accounted for up to 70% of China's population, we infer that the 2.5% of the population in grids with the largest US tariff shocks saw a 2.52% (1.62%) decrease in income per capita (manufacturing employment) relative to unaffected grids. By contrast, we do not find significant effects from China's retaliatory tariffs.
JEL:	E01 F10 F13 F14 F16
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29349&r=

Spatial regression graph convolutional neural networks: A deep learning paradigm for spatial multivariate distributions

By:	Zhu, Di; Liu, Yu; Yao, Xin; Fischer, Manfred M.
Abstract:	Geospatial artificial intelligence (GeoAI) has emerged as a subfield of GIScience that uses artificial intelligence approaches and machine learning techniques for geographic knowledge discovery. The non-regularity of data structures has recently led to different variants of graph neural networks in the field of computer science, with graph convolutional neural networks being one of the most prominent that operate on non- euclidean structured data where the numbers of nodes connections vary and the nodes are unordered. These networks use graph convolution - commonly known as filters or kernels - in place of general matrix multiplication in at least one of their layers. This paper suggests spatial regression graph convolutional neural networks (SRGCNNs) as a deep learning paradigm that is capable of handling a wide range of geographical tasks where multivariate spatial data needs modeling and prediction. The feasibility of SRGCNNs lies in the feature propagation mechanisms, the spatial locality nature, and a semi-supervised training strategy. In the experiments, this paper demonstrates the operation of SRGCNNs with social media check-in data in Beijing and house price data in San Diego. The results indicate that a well-trained SRGCNN model is capable of learning from samples and performing reasonable predictions for unobserved locations. The paper also presents the effectiveness of incorporating the idea of geographically weighted regression for handling heterogeneity between locations in the model approach. Compared to conventional spatial regression approaches, SRGCNN-based models tend to generate much more accurate and stable results, especially when the sampling ratio is low. This study offers to bridge the methodological gap between graph deep learning and spatial regression analytics. The proposed idea serves as an example to illustrate how spatial analytics can be combined with state-of-the-art deep learning models, and to enlighten future research at the front of GeoAI.
Date:	2021–10–19
URL:	http://d.repec.org/n?u=RePEc:wiw:wus046:8360&r=

Reputational Assets and Social Media Marketing Activeness: Empirical Insights from China

By:	Johansson, Anders C. (Stockholm China Economic Research Institute); Zhu, Zhen (Kent Business School, University of Kent)
Abstract:	We explore the linkages between social media marketing activeness and reputational assets on digital platforms with a unique sample of over 8,000 customer-to-customer (C2C) sellers registered on both Taobao, China’s largest C2C online shopping platform, and Sina Weibo, China’s largest microblogging platform. A unique collaborative effort between the two platforms enables us to examine whether C2C sellers are motivated to engage in marketing activities on a separate social media platform. Applying machine learning and natural language processing methods, we first identify whether C2C sellers conduct social media marketing on their microblogs. We then differentiate between earned and owned reputation factors accumulated on both platforms and test their relationships to social media marketing activeness. We find that earned reputation factors on both platforms are significantly associated with social media marketing activeness. However, we identify a conflict of owned reputation factors between the two platforms, which provides a potential explanation for the limited success of the cross-platform collaboration.
Keywords:	social media marketing; reputational assets; electronic commerce; China
JEL:	L81 M15 M30 M31
Date:	2021–10–15
URL:	http://d.repec.org/n?u=RePEc:hhs:hascer:2021-053&r=

Impact of public news sentiment on stock market index return and volatility

By:	Anese, Gianluca; Corazza, Marco; Costola, Michele; Pelizzon, Loriana
Abstract:	Recent advances in natural language processing have contributed to the development of market sentiment measures through text content analysis in news providers and social media. The effectiveness of these sentiment variables depends on the implemented techniques and the type of source on which they are based. In this paper, we investigate the impact of the release of public financial news on the S&P 500. Using automatic labeling techniques based on either stock index returns or dictionaries, we apply a classification problem based on long short-term memory neural networks to extract alternative proxies of investor sentiment. Our findings provide evidence that there exists an impact of those sentiments in the market on a 20-minute time frame. We find that dictionary-based sentiment provides meaningful results with respect to those based on stock index returns, which partly fails in the mapping process between news and financial returns.
Keywords:	Public financial news,Stock market,NLP,Dictionary,LSTM neural networks,Investor sentiment,S&P 500
JEL:	G14 G17 C45 C63
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:zbw:safewp:322&r=

Machine learning in energy forecasts with an application to high frequency electricity consumption data

By:	Erik Heilmann (University of Kassel); Janosch Henze (University of Kassel); Heike Wetzel (University of Kassel)
Abstract:	Forecasting plays an essential role in energy economics. With new challenges and use cases in the energy system, forecasts have to meet more complex requirements, such as increasing temporal and spatial resolution of data. The concept of machine learning can meet these requirements by providing different model approaches and a standardized process of model selection. This paper provides a concise and comprehensible introduction to the topic by discussing the concept of machine learning in the context of energy economics and presenting an exemplary application to electricity load data. For this, we introduce and demonstrate the structured machine learning process containing the preparation, model selection and test of forecast models. This process is intended to serve as a general guideline for energy economists and practitioners who need to apply sophisticated forecast models.
Keywords:	machine learning, electricity consumption forecast, artificial neural network, time series forecast
JEL:	C45 C53 Q47
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:mar:magkse:202135&r=

Análisis de narrativas en línea sobre el empoderamiento de las mujeres sobrevivientes de las violencias basadas en género en Colombia con procesamiento de lenguaje natural

By:	Susana Martínez-Restrepo; Lina Tafur; Juan G. Ocio; Caroline Brethenoux; Patrick Furey; Orlando Rivera
Abstract:	Este estudio utiliza el Big Data y el procesamiento de lenguaje natural para entender 4,7 millones de narrativas en línea en Colombia sobre las violencias basadas en género sufridas por sobrevivientes y relacionadas con su proceso de empoderamiento. Este estudio realiza un análisis de sentimientos y de tópicos de las narrativas de las violencias basadas en género escritas en primera persona por las sobrevivientes en línea entre noviembre de 2016 y febrero de 2020. Esta metodología permite establecer los estados emocionales (Sentiment Drivers) para comprender aspectos centrales del proceso de empoderamiento (o la falta de este)según el marco de Kabeer.
Keywords:	Big Data, Mujeres, Violencia, Violencia Basada en Género, Género, Empoderamiento Económico, Colombia
JEL:	C55 J12 J16 I38 D74
Date:	2021–08–31
URL:	http://d.repec.org/n?u=RePEc:col:000124:019664&r=

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Gan, Tianqi
Keywords:	Research Methods/Statistical Methods, Consumer/Household Economics, International Development
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:ags:aaea21:313988&r=