Difference between revisions of "Turney,2002"

From Cohen Courses
Jump to navigationJump to search
Line 15: Line 15:
  
 
<math>
 
<math>
PMI(w_1,w_2)=log_2(p(w_1 .and. w_2)/p(w_1)p(w_2))
+
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))
 
</math>
 
</math>
  

Revision as of 15:00, 1 December 2010

Citation

Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02

Online version

[[1]]

Summary

This paper presents a simple unsupervised learning algorithm for Opinion mining problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor".

Description of the method

The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words and is defined as follow:

where is the probability that and co-occur. They have defined the semantic orientation of a phrase as follow:

We can modify the above definition to obtain the following formula:

Failed to parse (syntax error): {\displaystyle SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} ) }



Then they estimate the semantic orientation of each phrase in the document. The last step

Evaluation Results

They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by Turney,2002 as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.