Abstract: |
Inexperienced consumers may have high uncertainty about experience goods that
require technical knowledge and skills to operate effectively; therefore,
experienced consumersâ prior reviews can be useful for inexperienced ones.
However, the one-sided review system (e.g., Amazon.com) only provides the
opportunity for consumers to write a review as a buyer and contains no
feedback from the sellerâs side, so the information displayed about individual
buyers is limited. This study analyzes consumersâ digital footprints (DFs) to
identify and predict unobserved consumer preferences from online product
reviews. It makes use of Python coding along with high-performance computing
to extract reviewersâ DFs for a specific product group (programmable
thermostats) from a dataset of 141 million Amazon reviews. It identifies
consumersâ sentiment toward product content dimensions (PCDs) extracted from
review text by applying topic modeling and domain expert annotations. However,
some questionable reviews (posted by âsuspicious one-time reviewersâ and
âalways-the-same rating reviewersâ) are excluded. This paper obtains three
main results: First, I find that the factors that affect consumer ratings are:
(a) userâ DFs (e.g., length of the product review, average rating across all
categories, volume of prior reviews overall and in sub-categories), (b)
reviewersâ attitudes toward eight product content dimensions (smart
connectivity, easiness, energy saving, functionality, support, price value,
privacy, and the Amazon effect), and (c) other prior reviewers DFs (e.g.,
length of the review summary.) All the heteroskedastic ordered probit models
with DF and sentiment variables show a better model fit than the base model.
This paper is the first to identify the effect of service quality of the
online platform (Amazon.com) on ratings. Second, extreme gradient boosting
(XGBoost) is found to obtain the highest F1 score for predicting the ratings
of potential consumers before they make a purchase or write a review. All the
models containing DF and sentiment variables show a higher prediction
performance than the base model. Classifications with a lower range of labels
(three-class or binary classifications) show better prediction performance
than the five-star rating classification. However, the performance for the
minority class is low. Third, a convolutional neural network (CNN) on top of
Bidirectional Encoder Representations from Transformers (BERT) embedding shows
the highest F1 score for classifying consumersâ sentiment toward a specific
PCD. Overall, this approach developed in this paper is applicable, scalable,
and interpretable for distinguishing important drivers of consumer reviews for
different goods in a specific industry and can be used by industry to identify
and predict unobserved consumer preferences and sentiment associated with
product content dimensions. |