Popescu and Etzioni , EMNLP 2005
This is a summary of research paper as part of Social Media Analysis 10-802, Fall 2012.
Contents
Citation
Ana-Maria Popescu , Oren Etzioni, Extracting product features and opinions from reviews, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.339-346, October 06-08, 2005, Vancouver, British Columbia, Canada.
Online Version
Abstract from the paper
Consumers are often forced to wade through many on-line reviews in order to make an informed product choice. This paper introduces OPINE, an unsupervised information-extraction system which mines reviews in order to build a model of important product features, their evaluation
by reviewers, and their relative quality across products.
Compared to previous work, OPINE achieves 22% higher precision (with only 3% lower recall) on the feature extraction task. OPINE’s novel use of
relaxation labeling for finding the semantic orientation of words in context leads to strong performance on the tasks of finding opinion phrases and their polarity.
Summary
Overview
This paper proposes various methods for opinion mining and classification of product reviews as positive or negative for specific product features. The paper describes four main sub-problems to deal with -
- Identifying product features/attributes
- Mining opinions about product features
- Determining opinion polarity
- Ranking opinions based on their strength
In order to solve the above sub tasks, this paper introduces OPINE, an unsupervised review mining system, built on top of the KnowItAll web information extraction (IE) system. In this paper, the authors mainly discuss about the first three sub tasks.
OPINE System
The OPINE system proposed in this paper helps in extracting features and opinion phrases describing these features for different product classes. It is built on top of KnowItAll IE system which uses point-wise mutual information between the candidate facts for a given relation and the automatically generated discriminator phrases. The PMI scores used with Naive Bayes classifier give the probability associated with each fact.
Finding Explicit Features
OPINE recursively extracts, for each product class, its parts and properties and then their parts and properties until no more candidates are available. It uses the PMI scores from KnowItAll system to identify candidate noun phrases for parts and uses WordNet and morphological analysis to identify more parts and properties using relations.
Finding Opinion Phrases and Polarity
OPINE uses the syntactic dependencies obtained from MINIPAR parser to identify the candidate opinion phrases in the vicinity of product features based on some domain-independent extraction rules. It then picks only those candidate opinion phrases whose head word has a positive or negative semantic orientation (SO). It uses Relaxation Labeling - unsupervised classification technique, to find SO label for a given opinion word and its corresponding feature and sentence. OPINE uses neighborhood constraints based on conjunctions, disjunctions, synsets from WordNet, morphological cues etc with relaxation labeling (RL) to determine SO labels for a word. RL initializes the SO labels of words using difference of their PMI values with positive and negative keywords. Then using the neighborhood features, it determines the probability for SO label over multiple iterations. Finally based on the SO label for an opinion word, feature and sentence tuple, the opinion phrases are listed as positive, negative or neutral.
Evaluation
Finding Explicit Features
The authors use 7 different product classes obtained from amazon.com and compare OPINE with Hu and Liu, 2004 opinion mining system. OPINE shows an improvement of 22% in precision as compared to Hu's system while there is a drop of 3% in the recall. They analyzed that OPINE gains around 6% precision by using PMI assessment of reviews and another 14% by using Web PMI statistics from KnowItAll. In order to show the system robustness, they report an 89% precision and 73% recall on another set of reviews of hotels and scanners from tripadvisor.com and amazon.com respectively.
Opinion Phrase Extraction and Polarity Extraction
Baseline:
- PMI++ - Extended version of Turney, 2002, using PMI statistics for opinion word and feature and ignoring the sentence to get the SO labels.
- Hu++ - Extended version of Hu, 2004, using WordNet to obtain context-independent SO label for a word.
For the task of finding SO label for a word, the paper reports that OPINE achieves a better precision than both baselines methods with slightly less recall than PMI++. Hu++ does well for strongly opinionated words as simply obtaining the SO label for such a word is enough in most context.
For opinion phrase extraction and finding the opinion polarity, again, OPINE outperformed both baseline methods in precision with slight recall drop in comparison with PMI++.
Discussion
OPINE shows a good precision on opinion mining and polarity determination due to handling context-sensitive opinion words using RL method. It also helps in identifying more product features by recursively identifying parts of a product.
Related Papers
- Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, 79–86. [One of the earliest work on sentiment analysis which later inspired further work on review classification]
- Turney, P. D. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 417–424. [The authors use an extended version of this paper's proposed method as baseline]
- M. Hu and B. Liu. Mining Opinion Features in Customer Reviews. In Proceedings of Nineteenth National Conference on Artificial Intelligence. 2004. [The authors use an extended version of this paper's proposed method as baseline]
- Dave, K., Lawrence, S., and Pennock, D.M. 2003. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. WWW 2003. [This paper also worked on opinion mining from similar product reviews dataset]
Study Plan
Resources useful for understanding this paper
- Article: Opinion Mining
- Paper: KnowItAll - O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. 2005. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91–134.
- Paper: Relaxation Labeling - R.A. Hummel and S.W. Zucker. 1983. On the foundations of relaxation labeling processes. In PAMI, pages 267–287.
- Paper: Semantic Orientation - Peter D. Turney , Michael L. Littman, Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems. 2003.