Difference between revisions of "Turney,2002"
PastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
== Summary == | == Summary == | ||
− | This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". | + | This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and the word "excellent" minus the mutual information between the input phrase and the word "poor". |
== Description of the method == | == Description of the method == | ||
− | The algorithm takes a written review as | + | The algorithm takes a written review as an input. First it assigns a POS tag to each word in the review to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow: |
<math> | <math> | ||
− | PMI(w_1,w_2)=log_2(p(w_1 | + | PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2)) |
</math> | </math> | ||
Line 21: | Line 21: | ||
<math> | <math> | ||
− | SO(phrase)=PMI(phrase, | + | SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor') |
</math> | </math> | ||
Line 27: | Line 27: | ||
<math> | <math> | ||
− | SO(phrase)=log_2(\frac{hits(phrase NEAR | + | SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} ) |
</math> | </math> | ||
+ | where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that the value of average semantic orientation for phrases in the items that are tagged as "recommended" by the users are usually positive and those that are classified as "not recommended" are usually negative. | ||
− | + | == Evaluation Results == | |
− | + | To evaluate their technique they have chosen 410 reviews from Epinions. The accuracy of a classifier that guesses the majority class is 59% while PMI-IR technique achieves 75% accuracy. | |
− | |||
− | |||
− | == Evaluation Results == | ||
− |
Latest revision as of 11:16, 2 December 2010
Citation
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02
Online version
[[1]]
Summary
This paper presents a simple unsupervised learning algorithm for Opinion mining problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and the word "excellent" minus the mutual information between the input phrase and the word "poor".
Description of the method
The algorithm takes a written review as an input. First it assigns a POS tag to each word in the review to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words and is defined as follow:
where is the probability that and co-occur. They have defined the semantic orientation of a phrase as follow:
We can modify the above definition to obtain the following formula:
where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that the value of average semantic orientation for phrases in the items that are tagged as "recommended" by the users are usually positive and those that are classified as "not recommended" are usually negative.
Evaluation Results
To evaluate their technique they have chosen 410 reviews from Epinions. The accuracy of a classifier that guesses the majority class is 59% while PMI-IR technique achieves 75% accuracy.