Difference between revisions of "Turney,2002"

From Cohen Courses
Jump to navigationJump to search
 
(7 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
== Summary ==
 
== Summary ==
  
This [[Category::paper]] presents a simple unsupervised learning algorithm for  [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor".   
+
This [[Category::paper]] presents a simple unsupervised learning algorithm for  [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and the word "excellent" minus the mutual information between the input phrase and the word "poor".   
  
 
== Description of the method ==
 
== Description of the method ==
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:
+
The algorithm takes a written review as an input. First it assigns a POS tag to each word in the review to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:
 
 
 
 
<math>
 
PMI(w_1,w_2)=log_2(p(w_1 \& w_2))
 
</math>
 
 
 
  
 
<math>
 
<math>
PMI(w_1,w_2)=log_2(p(w_1\&w_2)/p(w_1)p(w_2))
+
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))
 
</math>
 
</math>
  
Line 27: Line 21:
  
 
<math>
 
<math>
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")
+
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')
 
</math>
 
</math>
  
Line 33: Line 27:
  
 
<math>
 
<math>
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )
+
SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} )
 
</math>
 
</math>
  
 +
where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that the value of average semantic orientation for phrases in the items that are tagged as "recommended" by the users are usually positive and those that are classified as "not recommended" are usually negative.
  
 
+
== Evaluation Results ==
 
+
To evaluate their technique they have chosen 410 reviews from Epinions. The accuracy of a classifier that guesses the majority class is 59% while PMI-IR technique achieves 75% accuracy.
Then they estimate the semantic orientation of each phrase in the document. The last step
 
 
 
== Evaluation Results ==  
 
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.
 

Latest revision as of 11:16, 2 December 2010

Citation

Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02

Online version

[[1]]

Summary

This paper presents a simple unsupervised learning algorithm for Opinion mining problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and the word "excellent" minus the mutual information between the input phrase and the word "poor".

Description of the method

The algorithm takes a written review as an input. First it assigns a POS tag to each word in the review to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words and is defined as follow:

where is the probability that and co-occur. They have defined the semantic orientation of a phrase as follow:

We can modify the above definition to obtain the following formula:

where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that the value of average semantic orientation for phrases in the items that are tagged as "recommended" by the users are usually positive and those that are classified as "not recommended" are usually negative.

Evaluation Results

To evaluate their technique they have chosen 410 reviews from Epinions. The accuracy of a classifier that guesses the majority class is 59% while PMI-IR technique achieves 75% accuracy.