Difference between revisions of "Latent Friend Mining from Blog Data, ICDM 2006"

From Cohen Courses
Jump to navigationJump to search
Line 24: Line 24:
  
 
== Study plan ==
 
== Study plan ==
* Article: [cosine similarity]
+
* Article: [[cosine similarity]]
* Article: [KL divergence]
+
* Article: [[KL divergence]]
* Article: [topic model]
+
* Article: [[topic model]]
 +
]

Revision as of 15:38, 1 November 2012

Citation

Latent Friend Mining from Blog Data

Dou Shen, Jian-Tao Sun, Qiang Yang, Zheng Chen

Online version

Latent Friend Mining from Blog Data

Summary

The paper proposed a new problem to find latent friend for web bloggers. The paper compared three different algorithms, i.e. cosine similarity, topic model and a ad-hoc two phrase algorithm to address this new problem. Moreover, they built a dataset MSN Spaces to evaluate those three methods.

Discussion

This paper addresses the problem of judging how positive or negative or neutral a word (here is more about WordNet synset) is, which is one of major task in sentiment analysis. In this paper, the authors proposed to leverage PageRank algorithm on the graph built on WordNet synset. Under the intuition that if a synset sk that contributes to the definition of synset si by virtue of its member terms occurring in the gloss of si, then the polarity of synset sk contributes to the polarity of synset si, the authors built the graph as G=(V.E) where V is all WordNet synsets and edge (si -> sk) is in E if and only if the gloss of synset si contains a term belonging to synset sk.

The strong points of the paper includes:

 1. It first introduced PageRank into solving the words (or synset) polarity problem.
 2. It considered positivity and negativity separately so that it can classify words (or synset) into three categories: positive, negative and neutral.

The weak point of the paper includes:

 1. This paper defined, solved and evaluated the problem on WordNet synsets, but WordNet synsets is not what we meet in real text. As a result, I think it might be better if the authors can provide a method to convert words into WordNet synsets and evaluate the proposed method on real world text.
 2. It didn't consider the POS tag. We know that sense of words might vary a lot on different POS tags. As a result, even if a term in sk occurs in the gloss of si, it not necessarily suggest that the term represents the meaning of synset sk, thus sk might have different polarity with si.
 3. The degree of a node is associated with the length of definition, which has nothing to do with the polarity.

Study plan

]