Difference between revisions of "Esuli and Sebastiani ACT2007"

From Cohen Courses
Jump to navigationJump to search
 
(13 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
== Online version ==
 
== Online version ==
  
[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CC8QFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2FP%2FP07%2FP07-1054.pdf&ei=IN1oUPmAJ-2D0QGC_IHgAw&usg=AFQjCNFngTpBLcdfRknZAacBdQl19o5F4g&sig2=QoG5_T47GPnw-hrnjVITOQ ]
+
[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CC8QFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2FP%2FP07%2FP07-1054.pdf&ei=IN1oUPmAJ-2D0QGC_IHgAw&usg=AFQjCNFngTpBLcdfRknZAacBdQl19o5F4g&sig2=QoG5_T47GPnw-hrnjVITOQ PageRanking WordNet Synsets: An Application to Opionion Mining]
  
 
== Summary ==
 
== Summary ==
  
It's '''not a self-contained''' paper, it '''depends on another paper''' heavily. It's '''not a creative work''', and I strongly suggest '''not to recommend''' to future students.
+
The paper address the problem [[AddressesProblem::Sentiment_analysis| Sentiment Analysis]] in word level. The key idea in this paper is to use [[UsesMethod::PageRank]] algorithm to rank the "most" positive or negative synset in [[WordNet]]. As [[PageRank]] is a well studied algorithm, the most challenging part is how to construct a meaningful directed graph from [[WordNet]]. In this paper, the author explored one relation: if the gloss of synset si contains a term belonging to synset sk, then draw an edge si -> sk.
  
This paper address the [[problem::Sarcasm Detection]] problem in Twitter and Amazon review posts. They propose to use some [[UsesMethod::semi-supervised learning]] methods to automatically generate patterns, and feed those patterns to some machine learning algorithm to detect sarcasm. However, from this paper, I have no idea how they used the unlabeled text, and they didn't provide any explanation about the classification algorithm they used, i.e. [[UsesMethod::k-Nearest Neighbor]].
+
They experimented on one benchmark dataset: [[UsesDataset::Mirco-WNOp]]
  
They experimented on 2 dataset:  [[UsesDataset::Twitter Dataset for Sarcasm|Twitter Dataset]] and [[UsesDataset::Amazon Dataset for Sarcasm|Amazon Dataset]]
+
== Discussion ==
 +
This paper addresses the problem of judging how positive or negative or neutral a word (here is more about [[WordNet]] synset) is, which is one of major task in [[AddressesProblem::sentiment analysis]]. In this paper, the authors proposed to leverage [[PageRank]] algorithm on the graph built on [[WordNet]] synset. Under the intuition that if a synset sk that contributes to the definition of synset si by virtue of its member terms occurring in the gloss of si, then the polarity of synset sk contributes to the polarity of synset si, the authors built the graph as G=(V.E) where V is all [[WordNet]] synsets and edge (si -> sk) is in E if and only if the gloss of synset si contains a term belonging to synset sk.
  
== Evaluation ==
+
The strong points of the paper includes:
 
+
  1. It first introduced PageRank into solving the words (or synset) polarity problem.
In this paper, it proposed several feature extraction methods and a data enrichment method. In the evaluation part, it mainly compared the performance between those methods.
+
  2. It considered positivity and negativity separately so that it can classify words (or synset) into three categories: positive, negative and neutral.
Moreover, the authors used two settings to test the robustness, one is traditional in-domain cross validation and the other is cross domain test. It reported promising results on both settings.
 
 
 
== Discussion ==
 
First of all, I have to say it's '''not a self-contained''' paper, it '''depends on another paper''' heavily and it's '''not a creative work'''. This paper didn't change much from the  [[Tsur_et_al_ICWSM_10|AAAI 2010 paper]]. The only thing that this paper did is changed some small setting of previous paper: the algorithm follows [[Tsur_et_al_ICWSM_10|AAAI 2010 paper]], the feature follows the ACL 2006 paper [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCYQFjAA&url=http%3A%2F%2Fleibniz.cs.huji.ac.il%2Ftr%2F884.pdf&ei=f4doUN7hOq-O0QGYv4CQCg&usg=AFQjCNHmMVwq0zPYDEhpaScToMm5iVNO0A&sig2=jp-5-01q-OzlAY3AbIhntQ]
 
  
 
The weak point of the paper includes:
 
The weak point of the paper includes:
   1. It haven't any significant change to previous methods
+
   1. This paper defined, solved and evaluated the problem on [[WordNet]] synsets, but [[WordNet]] synsets is not what we meet in real text. As a result, I think it might be better if the authors can provide a method to convert words into [[WordNet]] synsets and evaluate the proposed method on real world text.
   2. It depended on another paper so heavy that the algorithm is not complete without that paper.
+
   2. It didn't consider the POS tag. We know that sense of words might vary a lot on different POS tags. As a result, even if a term in sk occurs in the gloss of si, it not necessarily suggest that the term represents the meaning of synset sk, thus sk might have different polarity with si.
   3. It didn't consider any baseline algorithms. For example, they can compare their method to other semi-supervised methods or related sarcasm detection methods.
+
   3. The degree of a node is associated with the length of definition, which has nothing to do with the polarity.
  
 
== Related papers ==
 
== Related papers ==
* Paper:Icwsm - a great catchy name: Semi-supervised recognition of sarcastic sentences in product reviews:[http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/download/1495/1851]
+
* Paper:Mining WordNet for Fuzzy Sentiment: Sentiment tag extraction from WordNet glosses:[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CDcQFjAB&url=http%3A%2F%2Facl.ldc.upenn.edu%2Feacl2006%2Fmain%2Fpapers%2F13_3_andreevskaiab_262.pdf&ei=T-1oUN588LzRAZiJgYgG&usg=AFQjCNExnIUmenVLi6yXEAfFD6V0bmZ-oA&sig2=5aehscC95EPyaWhT-kMQ4g]
* Paper:Efficient unsupervised discovery o word categories using symmetric patterns and high frequency words:[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCYQFjAA&url=http%3A%2F%2Fleibniz.cs.huji.ac.il%2Ftr%2F884.pdf&ei=f4doUN7hOq-O0QGYv4CQCg&usg=AFQjCNHmMVwq0zPYDEhpaScToMm5iVNO0A&sig2=jp-5-01q-OzlAY3AbIhntQ]
+
* Paper:Random walks on text structures:[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CDYQFjAB&url=http%3A%2F%2Fwww.cicling.org%2F2006%2FProceedings%2FLNCS-3878-Page249.pdf&ei=w-1oUOTSDurD0QHCkoD4Cw&usg=AFQjCNHdckJ2zL9hwyfNVSPXHgDnpCN71Q&sig2=p7W0ovvB9jAh3eU4SaPx5Q]
* Paper:Automatic satire detection: Are you having a laugh?:[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCYQFjAA&url=http%3A%2F%2Fwww.aclweb.org%2Fanthology-new%2FP%2FP09%2FP09-2041.pdf&ei=J4hoUKLwDqjq0gHmooHAAw&usg=AFQjCNFcfaQBaoIczy8ACgzt3Mwkl71IvQ&sig2=9BVbppWWro_T8PoED1GBPg]
+
* Paper:Using WordNet to measure semantic orientation of adjectives:[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCsQFjAA&url=http%3A%2F%2Fdare.uva.nl%2Fdocument%2F154122&ei=3O1oUOLsN4Ty0gHB-oFw&usg=AFQjCNHA3HhSDVD9YOfGGwwfgsdmiJQ9cA&sig2=yaFNVfHC6TN6TBEYfzLceQ]
 +
* Paper:SENTIWORDNET: A high-coverage lexical resouce for opinion mining[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CC8QFjAA&url=http%3A%2F%2Fontotext.fbk.eu%2FPublications%2FsentiWN-TR.pdf&ei=Y-5oUJi-NtO50AHLqICwAg&usg=AFQjCNH7FS3TIzYvWvWROGkeXNe-24iIAg&sig2=zUzEG0BzU7dZE48KfOssiQ]
  
 
== Study plan ==
 
== Study plan ==
As a typical incremental work, the original works includes:
+
* Article: WordNet :[http://en.wikipedia.org/wiki/WordNet]
* Paper:Icwsm - a great catchy name: Semi-supervised recognition of sarcastic sentences in product reviews:[http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/download/1495/1851]
+
* Article: PageRank :[http://en.wikipedia.org/wiki/Pagerank]
* Paper:Efficient unsupervised discovery o word categories using symmetric patterns and high frequency words:[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCYQFjAA&url=http%3A%2F%2Fleibniz.cs.huji.ac.il%2Ftr%2F884.pdf&ei=f4doUN7hOq-O0QGYv4CQCg&usg=AFQjCNHmMVwq0zPYDEhpaScToMm5iVNO0A&sig2=jp-5-01q-OzlAY3AbIhntQ]
+
* Paper: WordNet 2: A morphologically and semantically enhanced resource :[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CC4QFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2FW%2FW99%2FW99-0501.pdf&ei=K-1oULflLcf30gHzjYGQDg&usg=AFQjCNEmKxyKi6PbsgGHjCCkoshZ31t8eQ&sig2=M9aaqpHeN4ERFrSrqcFZxg]
 
 
And the classification algorithm used:
 
* Article: k-Nearest Neighbor:[[UsesMethod::k-Nearest Neighbor]]
 

Latest revision as of 14:40, 2 October 2012

Citation

PageRanking WordNet Synsets: An Application to Opionion Mining,

Andrea Esuli and Fabrizio Sebastiani

Online version

PageRanking WordNet Synsets: An Application to Opionion Mining

Summary

The paper address the problem Sentiment Analysis in word level. The key idea in this paper is to use PageRank algorithm to rank the "most" positive or negative synset in WordNet. As PageRank is a well studied algorithm, the most challenging part is how to construct a meaningful directed graph from WordNet. In this paper, the author explored one relation: if the gloss of synset si contains a term belonging to synset sk, then draw an edge si -> sk.

They experimented on one benchmark dataset: Mirco-WNOp

Discussion

This paper addresses the problem of judging how positive or negative or neutral a word (here is more about WordNet synset) is, which is one of major task in sentiment analysis. In this paper, the authors proposed to leverage PageRank algorithm on the graph built on WordNet synset. Under the intuition that if a synset sk that contributes to the definition of synset si by virtue of its member terms occurring in the gloss of si, then the polarity of synset sk contributes to the polarity of synset si, the authors built the graph as G=(V.E) where V is all WordNet synsets and edge (si -> sk) is in E if and only if the gloss of synset si contains a term belonging to synset sk.

The strong points of the paper includes:

 1. It first introduced PageRank into solving the words (or synset) polarity problem.
 2. It considered positivity and negativity separately so that it can classify words (or synset) into three categories: positive, negative and neutral.

The weak point of the paper includes:

 1. This paper defined, solved and evaluated the problem on WordNet synsets, but WordNet synsets is not what we meet in real text. As a result, I think it might be better if the authors can provide a method to convert words into WordNet synsets and evaluate the proposed method on real world text.
 2. It didn't consider the POS tag. We know that sense of words might vary a lot on different POS tags. As a result, even if a term in sk occurs in the gloss of si, it not necessarily suggest that the term represents the meaning of synset sk, thus sk might have different polarity with si.
 3. The degree of a node is associated with the length of definition, which has nothing to do with the polarity.

Related papers

  • Paper:Mining WordNet for Fuzzy Sentiment: Sentiment tag extraction from WordNet glosses:[1]
  • Paper:Random walks on text structures:[2]
  • Paper:Using WordNet to measure semantic orientation of adjectives:[3]
  • Paper:SENTIWORDNET: A high-coverage lexical resouce for opinion mining[4]

Study plan

  • Article: WordNet :[5]
  • Article: PageRank :[6]
  • Paper: WordNet 2: A morphologically and semantically enhanced resource :[7]