Xiang et al., 2010,Modeling Relationship Strength in Online Social Networks

From Cohen Courses
Jump to navigationJump to search

Online version

An online version of this paper is available at the [ACM digital library].

Summary

This paper investigates unsupervised models for Determining Social Network Attributes, more specifically, link strength in social network. Previous work focusing on friendship relations mostly assumes binary relation (connected or not connected). However, the authors argue that real-life network is a more complicated environment, where acquaintances and best-friends relations are mixed together. They develop an unsupervised model to estimate the strenght of these relations by using features such as bi-directional communication as well as user similarity. Their approach is evaluated on Facebook, and shows an improved classification accuracy.

Key Contributions

Unsupervised model for predicting relationship strength.

Background

This work is based on the principle of homophily, which states that two persons who have similar characteristics tend to tie to each other more strongly than two persons with no similarity. This relationship strength is assumed to impact directly the frequency of online communications, such as emails and direct messages in Facebook.

Models

Latent variable model

The authors define the following latent variable model:

Xiang.PNG Source: the original paper

where xi and xj are the feature vectors from the user i and user j. In their experiment, they use 3 different features:

  1. the logarithms of the normalized counts of common networks
  2. the logarithms of the normalized counts of common groups
  3. the logarithms of the normalized counts of common friends

These 2 vectors are then used to define zij, which is the latent variable representing the strength of the relation between the user i and j. In turn, this latent variable conditions probability of observations ys, which are concrete interaction between the two users (e.g., tagging, sending emails, etc.).

Inference

The details of the inference are detailed in the paper and too complex to be summarized here. However, it is worth noting that the authors detail a coordinate ascent approach that is used in the experiment to estimate the parameters of the models.

Experiments and Evaluation

For the evaluation of the model, the authors extracted a dataset from the Purdue Facebook network which consists of 4500 users and 144,712 pairs between the users. For all pair of users, they compute the 3 features described above, and infer the parameters of the latent variable model. The authors evaluate the quality of the model on a classification task: the leave some nodes in the graph unprocessed, and use Conditional Random Fields to propagate the information from known node to the other nodes (using the latent variable, the strength of the relationships, as weight on the edges of the graph). The following image shows the results for 4 different classification tasks:

  1. gender
  2. relationship status
  3. political view
  4. religious view


Xiang-results.PNG Source: the original paper

The relationship strength is compared to 4 other Facebook graphs (friendship, top-friend, wall, picture) as well as two heuristic graphs: profile-similarity and interaction-count (defined using wall links). The latent variable model outperforms the 6 others graphs on the different classification tasks. Thus, the authors conclude that the latent variable (the relationship strength), is useful at characterizing social networks.