# Wang et al ICDM 2010

## Citation

Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu. Discovering Overlapping Groups in Social Media. In Proceedings of The 10th IEEE International Conference on Data Mining (ICDM'10), 2010.

## Summary

This paper focuses on clustering with the purpose of discovering communities or groups in social media. The authors observe that many communities in social networks are overlapping -- users may belong to many communities at once. Previous methods give solutions that partition the network, and are not flexible enough to instead provide a set of overlapping subsets. Their method, however, relies on using other meta-information about the users. This work gives an alternate solution to the problem addressed by co-clustering. The authors introduce a general framework, and give specific applications that employ Cosine similarity, Latent semantic indexing, and Spectral Clustering.

### General Framework

The authors propose the following general strategy. Suppose that there are a collection of users and tags available in a social network. Tags are features that are now common in social networks, and can often be associated with users easily. Thus, we can construct a graph in which users and tags are nodes, and edges are only allowed between one tag node and one user node. The authors then propose to create overlapping communities by clustering the edges, rather than the nodes themselves.

Formally, we seek to create a clustering ${\displaystyle C=\{C_{1},\ldots ,C_{k}\}}$ which satisfies:

${\displaystyle \arg \max _{C}{\frac {1}{k}}\sum _{i=1}^{k}\sum _{x_{j}\in C_{i}}S(x_{j},c_{i})}$

Here, ${\displaystyle x_{j}}$ is an edge, ${\displaystyle c_{i}}$ is the center of cluster ${\displaystyle i}$, and ${\displaystyle S(\cdot ,\cdot )}$ is a similarity measure. Since each edge is between one tag and one user, the authors propose to define ${\displaystyle S(\cdot ,\cdot )}$ as a convex combination between a tag similarity and a user similarity:

${\displaystyle S(e_{1},e_{2})=\alpha S_{u}(u_{1},u_{2})+(1-\alpha )S_{t}(t_{1},t_{2})}$

The authors propose three specific ways to measure this similarity. First, they simply assign a delta function to the similarities of users and tags. This reduces the similarity to the familiar Cosine similarity. Second, they propose normalizing the similarities with the degrees of each tag and node. Finally, they propose a method termed correlational learning, which uses Spectral clustering to first construct a semantic space for both users and tags. After projecting the user and tag into their respective spaces, they compute the similarity using cosine similarity.