Difference between revisions of "Turney,2002"
PastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
Line 16: | Line 16: | ||
<math> | <math> | ||
− | PMI(w_1,w_2)=log_2(p(w_1 & w_2)) | + | PMI(w_1,w_2)=log_2(p(w_1 \& w_2)) |
</math> | </math> | ||
Revision as of 13:58, 1 December 2010
Citation
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02
Online version
[[1]]
Summary
This paper presents a simple unsupervised learning algorithm for Opinion mining problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor".
Description of the method
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words and is defined as follow:
where is the probability that and co-occur. They have defined the semantic orientation of a phrase as follow:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor") }
We can modify the above definition to obtain the following formula:
Failed to parse (syntax error): {\displaystyle SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} ) }
Then they estimate the semantic orientation of each phrase in the document. The last step
Evaluation Results
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by Turney,2002 as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.