Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

An Empirical Comparison of Machine Learning Methods for Text-based Sentiment Analysis of Online Consumer Reviews

Creative Commons 'BY-NC-ND' version 4.0 license
Abstract

The amount of digital text-based consumer review data has increased dramatically and there exist many machine learning approaches for automated text-based sentiment analysis. Marketing researchers have employed various methods for analyzing text reviews but lack a comprehensive comparison of their performance to guide method selection in future applications. We focus on the fundamental relationship between a consumer’s overall empirical evaluation, and the text-based explanation of their evaluation. We study the empirical tradeoff between predictive and diagnostic abilities, in applying various methods to estimate this fundamental relationship. We incorporate methods previously employed in the marketing literature, and methods that are so far less common in the marketing literature. For generalizability, we analyze 25,241 products in nine product categories, and 260,489 reviews across five review platforms. We find that neural network-based machine learning methods, in particular pre-trained versions, offer the most accurate predictions, while topic models such as Latent Dirichlet Allocation offer deeper diagnostics. However, neural network models are not suited for diagnostic purposes and topic models are ill equipped for making predictions. Consequently, future selection of methods to process text reviews is likely to be based on analysts’ goals of prediction versus diagnostics.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View