Xufei Wang, ICDM, 2010

Citation

Xufei Wang. 2010. Discovering Overlapping Groups in Social Media, the 10th IEEE International Conference on Data Mining (ICDM 2010).

Online Version

http://dmml.asu.edu/users/xufei/Papers/ICDM2010.pdf

Databases

BlogCatalog [1]

Delicious [2]

Summary

In this paper, the authors propose a novel co-clustering framework, which takes advantage of networking information between users and tags in social media, to discover these overlapping communities. The basic ideas are:

To discover overlapping communities in social media. Diverse interests and interactions that human beings can have in online social life suggest that one person often belongs more than one community.

To use user-tag subscription information instead of user-user links. Metadata such as tags become an important source in measuring the user-user similarity. The paper shows that more accurate community structures can be obtained by scrutinizing tag information.

To obtain clusters containing users and tags simultaneously. Existing co-clustering methods cluster users/tags separately. Thus, it is not clear which user cluster corresponds to which tag cluster. But the proposed method is able to ﬁnd out user/tag group structure and their correspondence

Brief description of the method

In this paper, the concept of community is generalized to include both users and tags. Tags of a community imply the major concern of people within it.

Let $\mu =\left(\mu _{1},\mu _{2},...,\mu _{m}\right)$ denote the user set, $\tau =\left(\tau _{1},\tau _{2},...,\tau _{n}\right)$ the tay set. A community $C_{i}\left(1\leq i\leq k\right)$ is a subset of user and tags, where k is the number of communities. As mentioned above, communities usually overlap, i.e., $C_{i}\bigcap C_{j}\neq \emptyset \left(1\leq i,j\leq k\right)$ .On the other hand, users and their subscribed tags form a user-tag matrix M, in which each entry $M_{ij}\in \left\{0,1\right\}$ indicates whether user $u_{i}$ subscribes to tag $t_{j}$ . So it is reasonable to view a user as a sparse vector of tags, and each tag as a sparse vector of users.