Difference between revisions of "Turney,2002"

Revision as of 13:58, 1 December 2010

Citation

Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02

Online version

[[1]]

Summary

This paper presents a simple unsupervised learning algorithm for Opinion mining problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor".

Description of the method

The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words $w_{1}$ and $w_{2}$ is defined as follow:

$PMI(w_{1},w_{2})=log_{2}(p(w_{1}\&w_{2}))$

$PMI(w_{1},w_{2})=log_{2}(p(w_{1}\&w_{2})/p(w_{1})p(w_{2}))$

where $p(w_{1},w_{2})$ is the probability that $w_{1}$ and $w_{2}$ co-occur. They have defined the semantic orientation of a phrase as follow:

Failed to parse (syntax error): {\displaystyle SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor") }

We can modify the above definition to obtain the following formula:

Failed to parse (syntax error): {\displaystyle SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} ) }

Then they estimate the semantic orientation of each phrase in the document. The last step

Evaluation Results

They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by Turney,2002 as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.

@@ Line 16: / Line 16: @@
 <math>
-PMI(w_1,w_2)=log_2(p(w_1 & w_2))
+PMI(w_1,w_2)=log_2(p(w_1 \& w_2))
 </math>

Difference between revisions of "Turney,2002"

Revision as of 13:58, 1 December 2010

Contents

Citation

Online version

Summary

Description of the method

Evaluation Results

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools