Latent Friend Mining from Blog Data, ICDM 2006
Citation
Latent Friend Mining from Blog Data
Dou Shen, Jian-Tao Sun, Qiang Yang, Zheng Chen
Online version
Latent Friend Mining from Blog Data
Summary
The paper proposed a new problem to find latent friend for web bloggers. The paper compared three different algorithms, i.e. cosine similarity, topic model and an ad-hoc two phrase algorithm to address this new problem. Moreover, they built a dataset MSN Spaces to evaluate those three methods.
Discussion
This paper proposed a novel problem of finding latent friend within web bloggers based on their interests. In this paper, the authors gave a formal definition of "latent friend" and introduced the importance of this new problem. Three methods were proposed and compared in the paper, they are:
1. Cosine Similarity. This approach just build a bag-of-words vector for each user and "friendship" of two users is measured according to the cosine similarity between the corresponding words vectors.
2. Topic model. Each user is represented by a topic distribution, and "friendship" is based on the KL divergence of the two distribution.
3. Ad-hoc two phrase algorithm. In first phrase the authors calculate similarity at topic level, where topic is predefined hierarchy. In second phrase, they calculate similarity within each topic.
After giving the algorithms, this paper build a dataset from MSN Spaces to evaluate those three methods and find the ad-hoc two phrase algorithm worked best.
Study plan
- Article: cosine similarity
- Article: KL_Divergence
- Article: topic model
]