Difference between revisions of "Wang et al ICDM 2010"
Line 15: | Line 15: | ||
The authors propose the following general strategy. Suppose that there are a collection of users and tags available in a social network. Tags are features that are now common in social networks, and can often be associated with users easily. Thus, we can construct a graph in which users and tags are nodes, and edges are only allowed between one tag node and one user node. The authors then propose to create overlapping communities by clustering the edges, rather than the nodes themselves. | The authors propose the following general strategy. Suppose that there are a collection of users and tags available in a social network. Tags are features that are now common in social networks, and can often be associated with users easily. Thus, we can construct a graph in which users and tags are nodes, and edges are only allowed between one tag node and one user node. The authors then propose to create overlapping communities by clustering the edges, rather than the nodes themselves. | ||
− | Formally, we seek to create a clustering <math>C = \{C_1,\ldots,C_k\}<\math> which satisfies: | + | Formally, we seek to create a clustering <math>C = \{ C_1,\ldots,C_k \} <\math> which satisfies: |
<math> | <math> |
Revision as of 13:50, 12 February 2011
Citation
Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu. Discovering Overlapping Groups in Social Media. In Proceedings of The 10th IEEE International Conference on Data Mining (ICDM'10), 2010.
Online version
Available on Lei Tang's Website
Summary
This paper focuses on clustering with the purpose of discovering communities or groups in social media. The authors observe that many communities in social networks are overlapping -- users may belong to many communities at once. Previous methods give solutions that partition the network, and are not flexible enough to instead provide a set of overlapping subsets. Their method, however, relies on using other meta-information about the users. This work gives an alternate solution to the problem addressed by co-clustering. The authors introduce a general framework, and give specific applications that employ Cosine similarity, Latent semantic indexing, and Spectral Clustering.
General Framework
The authors propose the following general strategy. Suppose that there are a collection of users and tags available in a social network. Tags are features that are now common in social networks, and can often be associated with users easily. Thus, we can construct a graph in which users and tags are nodes, and edges are only allowed between one tag node and one user node. The authors then propose to create overlapping communities by clustering the edges, rather than the nodes themselves.
Formally, we seek to create a clustering Failed to parse (unknown function "\math"): {\displaystyle C = \{ C_1,\ldots,C_k \} <\math> which satisfies: <math> \arg \max_C \frac{1}{k} \sum_{i=1}^k \sum_{x_j \in C_i} S(x_j,c_i) }
Here, <math>x_j <\math> is an edge, <math> c_i<\math> is the center of cluster <math>i<\math>, and <math> S(\cdot,\cdot)<\math> is a similarity measure. Since each edge is between one tag and one user, the authors propose to define <math>S(\cdot,\cdot)<\math> as a convex combination between a tag similarity and a user similarity:
<math> S(e_1, e_2) = \alpha S_u(u_1,u_2) + (1 -\alpha )S_t(t_1, t_2) <\math>