Popescu and Etzioni , EMNLP 2005

From Cohen Courses
Jump to navigationJump to search

This is a summary of research paper as part of Social Media Analysis 10-802, Fall 2012.

Citation

Ana-Maria Popescu , Oren Etzioni, Extracting product features and opinions from reviews, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.339-346, October 06-08, 2005, Vancouver, British Columbia, Canada.

Online Version

Direct PDF link

Abstract from the paper

Consumers are often forced to wade through many on-line reviews in order to make an informed product choice. This paper introduces OPINE, an unsupervised information-extraction system which mines reviews in order to build a model of important product features, their evaluation by reviewers, and their relative quality across products.
Compared to previous work, OPINE achieves 22% higher precision (with only 3% lower recall) on the feature extraction task. OPINE’s novel use of relaxation labeling for finding the semantic orientation of words in context leads to strong performance on the tasks of finding opinion phrases and their polarity.

Summary

Overview

This paper proposes various methods for opinion mining and classification of product reviews as positive or negative for specific product features. The paper describes four main sub-problems to deal with -

  1. Identifying product features/attributes
  2. Mining opinions about product features
  3. Determining opinion polarity
  4. Ranking opinions based on their strength

In order to solve the above sub tasks, this paper introduces OPINE, an unsupervised review mining system, built on top of the KnowItAll web information extraction (IE) system. In this paper, the authors mainly discuss about the first three sub tasks.

OPINE System

The OPINE system proposed in this paper helps in extracting features and opinion phrases describing these features for different product classes. It is built on top of KnowItAll IE system which uses point-wise mutual information between the candidate facts for a given relation and the automatically generated discriminator phrases. The PMI scores used with Naive Bayes classifier give the probability associated with each fact.

Finding Explicit Features

OPINE recursively extracts, for each product class, its parts and properties and then their parts and properties until no more candidates are available. It uses the PMI scores from KnowItAll system to identify candidate noun phrases for parts and uses WordNet and morphological analysis to identify more parts and properties using relations.

Finding Opinion Phrases and Polarity

OPINE uses the syntactic dependencies obtained from MINIPAR parser to identify the candidate opinion phrases in the vicinity of product features based on some domain-independent extraction rules. It then picks only those candidate opinion phrases whose head word has a positive or negative semantic orientation (SO). It uses Relaxation Labeling - unsupervised classification technique, to find SO label for a given opinion word and its corresponding feature and sentence. OPINE uses neighborhood constraints based on conjunctions, disjunctions, synsets from WordNet, morphological cues etc with relaxation labeling (RL) to determine SO labels for a word. RL initializes the SO labels of words using difference of their PMI values with positive and negative keywords. Then using the neighborhood features, it determines the probability for SO label over multiple iterations. Finally based on the SO label for an opinion word, feature and sentence tuple, the opinion phrases are listed as positive, negative or neutral.

Evaluation

Finding Explicit Features

The authors use 7 different product classes obtained from amazon.com and compare OPINE with Hu and Liu, 2004 opinion mining system. OPINE shows an improvement of 22% in precision as compared to Hu's system while there is a drop of 3% in the recall. They analyzed that OPINE gains around 6% precision by using PMI assessment of reviews and another 14% by using Web PMI statistics from KnowItAll. In order to show the system robustness, they report an 89% precision and 73% recall on another set of reviews of hotels and scanners from tripadvisor.com and amazon.com respectively.

Opinion Phrase Extraction and Polarity Extraction

Baseline:

  • PMI++ - Extended version of Turney, 2002, using PMI statistics for opinion word and feature and ignoring the sentence to get the SO labels.
  • Hu++ - Extended version of Hu, 2004, using WordNet to obtain context-independent SO label for a word.

For the task of finding SO label for a word, the paper reports that OPINE achieves a better precision than both baselines methods with slightly less recall than PMI++. Hu++ does well for strongly opinionated words as simply obtaining the SO label for such a word is enough in most context.
For opinion phrase extraction and finding the opinion polarity, again, OPINE outperformed both baseline methods in precision with slight recall drop in comparison with PMI++.

Discussion

OPINE shows a good precision on opinion mining and polarity determination due to handling context-sensitive opinion words using RL method. It also helps in identifying more product features by recursively identifying parts of a product.

Related Papers

Study Plan

Resources useful for understanding this paper