nep-mkt New Economics Papers
on Marketing
Issue of 2020‒11‒30
one paper chosen by
Marco Novarese
Università del Piemonte Orientale

  1. Identifying Consumer Preferences from User- and Crowd-Generated Digital Footprints on by Leveraging Machine Learning and Natural Language Processing By Jikhan Jeong

  1. By: Jikhan Jeong
    Abstract: Inexperienced consumers may have high uncertainty about experience goods that require technical knowledge and skills to operate effectively; therefore, experienced consumersâ prior reviews can be useful for inexperienced ones. However, the one-sided review system (e.g., only provides the opportunity for consumers to write a review as a buyer and contains no feedback from the sellerâs side, so the information displayed about individual buyers is limited. This study analyzes consumersâ digital footprints (DFs) to identify and predict unobserved consumer preferences from online product reviews. It makes use of Python coding along with high-performance computing to extract reviewersâ DFs for a specific product group (programmable thermostats) from a dataset of 141 million Amazon reviews. It identifies consumersâ sentiment toward product content dimensions (PCDs) extracted from review text by applying topic modeling and domain expert annotations. However, some questionable reviews (posted by âsuspicious one-time reviewersâ and âalways-the-same rating reviewersâ) are excluded. This paper obtains three main results: First, I find that the factors that affect consumer ratings are: (a) userâ DFs (e.g., length of the product review, average rating across all categories, volume of prior reviews overall and in sub-categories), (b) reviewersâ attitudes toward eight product content dimensions (smart connectivity, easiness, energy saving, functionality, support, price value, privacy, and the Amazon effect), and (c) other prior reviewers DFs (e.g., length of the review summary.) All the heteroskedastic ordered probit models with DF and sentiment variables show a better model fit than the base model. This paper is the first to identify the effect of service quality of the online platform ( on ratings. Second, extreme gradient boosting (XGBoost) is found to obtain the highest F1 score for predicting the ratings of potential consumers before they make a purchase or write a review. All the models containing DF and sentiment variables show a higher prediction performance than the base model. Classifications with a lower range of labels (three-class or binary classifications) show better prediction performance than the five-star rating classification. However, the performance for the minority class is low. Third, a convolutional neural network (CNN) on top of Bidirectional Encoder Representations from Transformers (BERT) embedding shows the highest F1 score for classifying consumersâ sentiment toward a specific PCD. Overall, this approach developed in this paper is applicable, scalable, and interpretable for distinguishing important drivers of consumer reviews for different goods in a specific industry and can be used by industry to identify and predict unobserved consumer preferences and sentiment associated with product content dimensions.
    JEL: D80 M21 M31 C45
    Date: 2020–11–10

This nep-mkt issue is ©2020 by Marco Novarese. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.