Abstract: |
Reading and evaluating product reviews is central to how most people decide
what to buy and consume online. However, the recent emergence of Large
Language Models and Generative Artificial Intelligence now means writing
fraudulent or fake reviews is potentially easier than ever. Through three
studies we demonstrate that (1) humans are no longer able to distinguish
between real and fake product reviews generated by machines, averaging only
50.8% accuracy overall - essentially the same that would be expected by chance
alone; (2) that LLMs are likewise unable to distinguish between fake and real
reviews and perform equivalently bad or even worse than humans; and (3) that
humans and LLMs pursue different strategies for evaluating authenticity which
lead to equivalently bad accuracy, but different precision, recall and F1
scores - indicating they perform worse at different aspects of judgment. The
results reveal that review systems everywhere are now susceptible to
mechanised fraud if they do not depend on trustworthy purchase verification to
guarantee the authenticity of reviewers. Furthermore, the results provide
insight into the consumer psychology of how humans judge authenticity,
demonstrating there is an inherent 'scepticism bias' towards positive reviews
and a special vulnerability to misjudge the authenticity of fake negative
reviews. Additionally, results provide a first insight into the 'machine
psychology' of judging fake reviews, revealing that the strategies LLMs take
to evaluate authenticity radically differ from humans, in ways that are
equally wrong in terms of accuracy, but different in their misjudgments. |