Latent Friend Mining from Blog Data, ICDM 2006

From Cohen Courses
Jump to navigationJump to search

Citation

Latent Friend Mining from Blog Data

Dou Shen, Jian-Tao Sun, Qiang Yang, Zheng Chen

Online version

Latent Friend Mining from Blog Data

Summary

The paper proposed a new problem to find latent friend for web bloggers. The paper compared three different algorithms, i.e. cosine similarity, topic model and an ad-hoc two phrase algorithm to address this new problem. Moreover, they built a dataset MSN Spaces to evaluate those three methods.

Discussion

This paper proposed a novel problem of finding latent friend within web bloggers based on their interests. In this paper, the authors gave a formal definition of "latent friend" and introduced the importance of this new problem. Three methods were proposed and compared in the paper, they are:

1. Cosine Similarity. This approach just build a bag-of-words vector for each user and "friendship" of two users is measured according to the cosine similarity between the corresponding words vectors.

2. Topic model. Each user is represented by a topic distribution, and "friendship" is based on the KL divergence of the two distribution.

3. Ad-hoc two phrase algorithm. In first phrase the authors calculate similarity at topic level, where topic is predefined hierarchy. In second phrase, they calculate similarity within each topic.


After giving the algorithms, this paper build a dataset from MSN Spaces to evaluate those three methods and find the ad-hoc two phrase algorithm worked best.

Study plan

]