Tang et al.: ICDM 2009

From Cohen Courses
Jump to navigationJump to search

Citation

Lei Tang, Xufei Wang, and Huan Liu, "Uncovering Groups via Heterogeneous Interaction Analysis", Proceedings of the Ninth IEEE International Conference on Data Mining, 2009

Online version

[1]

Summary

This paper aims at Community Detection in multi-dimensional networks. They try to integrate the network information of multiple dimensions in order to discover cross-dimension group structures. The method is a two-phase strategy to identify the hidden structures shared across dimensions. At fist the structural features from each dimension of the network is extracted via modularity analysis, and then the features are integrated to find out a community structure among actors.

Method

The authors started by defining Modularitymatrix for one dimensional social networks and then defined modularity for m-dimensional networks based on that. To do this, they use Average modularity maximization, which tries to calculate the average interaction network among all actors and optimize the modularity for the obtained network. Another method they try is Total Modularity Maximization in which the total modularity among all the dimensions is maximized. This strategy considers the degree distribution in each dimension separately whereas the previous average modularity maximization does not distinguish degree distributions in any dimension. They use these two methods as baseline to compare their main method (Principal Modularity Maximization) with.

PRINCIPAL MODULARITY MAXIMIZATION (PMM)

This method solves the comparability problem of different dimensions. It consists of two steps: structural feature extraction and cross-dimension integration. In the first step, Structural features are defined as the network-extracted dimensions that are indicative of community structure. They can be computed by a low-dimensional embedding using the top eigenvectors of the modularity matrix. In Cross-Dimension Integration, they base the method on the expectation that the extracted structural features should be similar in different dimensions and result that minimizing the difference among features of various dimensions is equal to doing PCA on them. In summary, they first extract structural features from each dimension of the network via modularity maximization; then PCA is applied on the concatenated data to select the top eigenvectors. Afterwards, a simple clustering algorithm such as K-means is used to group all the actors based on these features.

Data and Evaluation

They used two types of experiments: one on synthetic data, and one on real data. They evaluated the synthetic data experiments by using NMI as a measure. For experiments on social data, they used YouTube data with 15,088 active user profiles. The evaluation of this data was how (d-1) dimensions of network can predict the dth dimension of it.