Difference between revisions of "Xufei Wang, ICDM, 2010"
Line 50: | Line 50: | ||
<math>S_{e}\left ( e,{e}' \right )= \frac{1}{2}\left ( \delta \left ( u_{i},u_{j} \right )+\delta \left ( t_{p},t_{q} \right ) \right )</math> | <math>S_{e}\left ( e,{e}' \right )= \frac{1}{2}\left ( \delta \left ( u_{i},u_{j} \right )+\delta \left ( t_{p},t_{q} \right ) \right )</math> | ||
− | '''Normalized Learning''' | + | '''B. Normalized Learning''' |
Let <math>d_{u_{i}}</math> denote the degree of the user <math>u_{i}</math>,and <math>d_{t_{p}}</math> represent the degree of tag <math>t_{p}</math> in a user-tag network. their cosine similarity can be rewritten as: | Let <math>d_{u_{i}}</math> denote the degree of the user <math>u_{i}</math>,and <math>d_{t_{p}}</math> represent the degree of tag <math>t_{p}</math> in a user-tag network. their cosine similarity can be rewritten as: | ||
<math>S_{e}\left ( e,{e}' \right )= \frac{d_{t_{p}}d_{t_{q}}\delta \left ( u_{i},u_{j} \right )+d_{u_{i}}d_{u_{j}}\delta \left ( t_{p},t_{q} \right )}{\sqrt{d_{u_{i}}^{2}+d_{t_{p}}^{2}}\sqrt{d_{u_{j}}^{2}+d_{t_{q}}^{2}}}</math> | <math>S_{e}\left ( e,{e}' \right )= \frac{d_{t_{p}}d_{t_{q}}\delta \left ( u_{i},u_{j} \right )+d_{u_{i}}d_{u_{j}}\delta \left ( t_{p},t_{q} \right )}{\sqrt{d_{u_{i}}^{2}+d_{t_{p}}^{2}}\sqrt{d_{u_{j}}^{2}+d_{t_{q}}^{2}}}</math> | ||
+ | |||
+ | '''C. Correlational Learning''' | ||
+ | |||
+ | The singular value decomposition of user-tag network M is given by <math>M= U\sum V^{T}</math>, where columns of U and V are the left and right singular vectors and <math>\sum </math> is the diagonal matrix whose elements are singular values. | ||
== Experimental Result == | == Experimental Result == | ||
== Related papers == | == Related papers == |
Revision as of 23:33, 27 March 2011
Contents
Citation
Xufei Wang. 2010. Discovering Overlapping Groups in Social Media, the 10th IEEE International Conference on Data Mining (ICDM 2010).
Online Version
http://dmml.asu.edu/users/xufei/Papers/ICDM2010.pdf
Databases
Summary
In this paper, the authors propose a novel co-clustering framework, which takes advantage of networking information between users and tags in social media, to discover these overlapping communities. The basic ideas are:
- To discover overlapping communities in social media. Diverse interests and interactions that human beings can have in online social life suggest that one person often belongs more than one community.
- To use user-tag subscription information instead of user-user links. Metadata such as tags become an important source in measuring the user-user similarity. The paper shows that more accurate community structures can be obtained by scrutinizing tag information.
- To obtain clusters containing users and tags simultaneously. Existing co-clustering methods cluster users/tags separately. Thus, it is not clear which user cluster corresponds to which tag cluster. But the proposed method is able to find out user/tag group structure and their correspondence
Problem Statement
In this paper, the concept of community is generalized to include both users and tags. Tags of a community imply the major concern of people within it.
Let denote the user set, the tay set. A community is a subset of user and tags, where k is the number of communities. As mentioned above, communities usually overlap, i.e., .On the other hand, users and their subscribed tags form a user-tag matrix M, in which each entry indicates whether user subscribes to tag . So it is reasonable to view a user as a sparse vector of tags, and each tag as a sparse vector of users.
Given notations above, the overlapping co-clustering problem can be stated formally as follows:
Input:
- A user-tag subscription matrix , where and are the numbers of users and tags.
- The number of communities k.
Output:
- k overlapping communities which consist of both users and tags.
Brief description of the method
Communities that aggregate similar users and tags together can be detected by maximizing intra-cluster similarity, which is shown below: where k is the number of communities, x is the edges and c is the centroid of community. This formulation can be solved by a k-means variant.
This paper uses different methods to solve the problem of overlapping communities:
A. Independent Learning
If two tags are different, their similarity can be defined as 0, and 1 if they are the same. their cosine similarity can be rewritten as:
B. Normalized Learning
Let denote the degree of the user ,and represent the degree of tag in a user-tag network. their cosine similarity can be rewritten as:
C. Correlational Learning
The singular value decomposition of user-tag network M is given by , where columns of U and V are the left and right singular vectors and is the diagonal matrix whose elements are singular values.