Difference between revisions of "Xiang et al., 2010,Modeling Relationship Strength in Online Social Networks"

From Cohen Courses
Jump to navigationJump to search
m (Created page with '== Online version == An online version of this paper is available at the [[http://portal.acm.org/ft_gateway.cfm?id=1718518&type=pdf&CFID=7653019&CFTOKEN=17876787 ACM digital lib…')
 
 
(7 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
== Summary ==
 
== Summary ==
  
[[AddressesProblem::Influence propagation]] in social networks has interesting applications, especially for viral marketing.  Most past studies assume as input a graph with nodes for each person, and weighted edges between the nodes if there is influence between the two persons. However, less attention has been put on how to build this graph using social media data. This [[Category::paper]] introduces a model of influence built using social graph data on one hand, and a log of action (e.g., joining a community) on the other hand. The model is validated on the [[UsesDataset::Flick dataset]], which consists in a social graph with 1.3M nodes/40M edges and action log of 300K distinct actions. They propose and evaluate both static and dynamic models for this problem, and show that the influence of users on others (e.g., influence another user to join a group) can be modeled with a high accuracy.
+
This [[category::paper]] investigates unsupervised models for [[AddressesProblem::Determining Social Network Attributes]], more specifically, link strength in social network. Previous work focusing on friendship relations mostly assumes binary  relation (connected or not connected). However, the authors argue that real-life network is a more complicated environment, where acquaintances and best-friends relations are mixed together. They develop an unsupervised model to estimate the strenght of these relations by using features such as bi-directional communication as well as user similarity.  Their approach is evaluated on Facebook, and shows an improved classification accuracy.
  
 
== Key Contributions ==
 
== Key Contributions ==
  
The biggest contribution claimed by the authors in this paper is the empirical evidence of influence in social networks using a discrete time model.
+
Unsupervised model for predicting relationship strength.
 +
 
 +
== Background ==
 +
This work is based on the principle of homophily, which states that two persons who have similar characteristics tend to tie to each other more strongly than two persons with no similarity. This relationship strength is assumed to impact directly the frequency of online communications, such as emails and direct messages in Facebook.
  
 
== Models ==
 
== Models ==
  
=== Static Model ===
+
=== Latent variable model ===  
Everytime a user tries to activate (i.e. influence) his neighbor, he has a probability Pi of succeeding.  This can be modeled using a Bernouilli distribution.
+
The authors define the following latent variable model:
 +
 
 +
[[File:Xiang.PNG]] Source: the original paper
 +
 
 +
where xi and xj are the feature vectors from the user i and user j. In their experiment, they use 3 different features:
  
=== Continuous Time ===
+
#the logarithms of the normalized counts of common networks
The previous model neglect the variation over time of the influence users have on each others.  For this reason, Goyal et al, present a continuous time model where the probability of a user influencing a neighbor depends on a time t variable (e.g., similar to an exponential decay model).
+
#the logarithms of the normalized counts of common groups
 +
#the logarithms of the normalized counts of common friends
  
=== Discrete Time ===
+
These 2 vectors are then used to define zij, which is the latent variable representing the strength of the relation between the user i and j. In turn, this latent variable conditions probability of observations ys, which are concrete interaction between the two users (e.g., tagging, sending emails, etc.). 
The above continuous model isn't truly continuous has it has to be simulated using increment on the variable t.  This process requires long run time for testingFor this reason, the authors present a discrete time model which assumes that the probability of a user influencing his neighbors his constant over a certain window of time after an action.
+
 
 +
=== Inference ===
 +
The details of the inference are detailed in the paper and too complex to be summarized hereHowever, it is worth noting that the authors detail a coordinate ascent approach that is used in the experiment to estimate the parameters of the models.
  
 
== Experiments and Evaluation ==
 
== Experiments and Evaluation ==
  
The different models are evaluated with a ROC curve with true positive rate and false positive rate varying for different threshold values.  The threshold value controls how much influence a user need to have from his neighbors in order to do the action (e.g., join a group). The result show that the static model fail to perform as well as the two temporally aware models, which probably indicates that influence does vary over time in social networks. Also, although discrete and continuous time models show similar performance, the discrete model has a faster run time.
+
For the evaluation of the model, the authors extracted a dataset from the [[UsesDataset::Purdue Facebook network]] which consists of 4500 users and 144,712 pairs between the users. For all pair of users, they compute the 3 features described above, and infer the parameters of the latent variable model.  The authors evaluate the quality of the model on a classification task: the leave some nodes in the graph unprocessed, and use [[UsesMethod::Conditional Random Fields]] to propagate the information from known node to the other nodes (using the latent variable, the strength of the relationships, as weight on the edges of the graph). The following image shows the results for 4 different classification tasks:
 +
 
 +
#gender
 +
#relationship status
 +
#political view
 +
#religious view
 +
 
 +
 
 +
[[File:Xiang-results.PNG]] Source: the original paper
  
[[File:Result_learning_influence_probabilities.jpg]] Source: the original paper
+
The relationship strength is compared to 4 other Facebook graphs (friendship, top-friend, wall, picture) as well as two heuristic graphs: profile-similarity and interaction-count (defined using wall links). The latent variable model outperforms the 6 others graphs on the different classification tasks.  Thus, the authors conclude that the latent variable (the relationship strength), is useful at characterizing social networks.

Latest revision as of 19:23, 25 March 2011

Online version

An online version of this paper is available at the [ACM digital library].

Summary

This paper investigates unsupervised models for Determining Social Network Attributes, more specifically, link strength in social network. Previous work focusing on friendship relations mostly assumes binary relation (connected or not connected). However, the authors argue that real-life network is a more complicated environment, where acquaintances and best-friends relations are mixed together. They develop an unsupervised model to estimate the strenght of these relations by using features such as bi-directional communication as well as user similarity. Their approach is evaluated on Facebook, and shows an improved classification accuracy.

Key Contributions

Unsupervised model for predicting relationship strength.

Background

This work is based on the principle of homophily, which states that two persons who have similar characteristics tend to tie to each other more strongly than two persons with no similarity. This relationship strength is assumed to impact directly the frequency of online communications, such as emails and direct messages in Facebook.

Models

Latent variable model

The authors define the following latent variable model:

Xiang.PNG Source: the original paper

where xi and xj are the feature vectors from the user i and user j. In their experiment, they use 3 different features:

  1. the logarithms of the normalized counts of common networks
  2. the logarithms of the normalized counts of common groups
  3. the logarithms of the normalized counts of common friends

These 2 vectors are then used to define zij, which is the latent variable representing the strength of the relation between the user i and j. In turn, this latent variable conditions probability of observations ys, which are concrete interaction between the two users (e.g., tagging, sending emails, etc.).

Inference

The details of the inference are detailed in the paper and too complex to be summarized here. However, it is worth noting that the authors detail a coordinate ascent approach that is used in the experiment to estimate the parameters of the models.

Experiments and Evaluation

For the evaluation of the model, the authors extracted a dataset from the Purdue Facebook network which consists of 4500 users and 144,712 pairs between the users. For all pair of users, they compute the 3 features described above, and infer the parameters of the latent variable model. The authors evaluate the quality of the model on a classification task: the leave some nodes in the graph unprocessed, and use Conditional Random Fields to propagate the information from known node to the other nodes (using the latent variable, the strength of the relationships, as weight on the edges of the graph). The following image shows the results for 4 different classification tasks:

  1. gender
  2. relationship status
  3. political view
  4. religious view


Xiang-results.PNG Source: the original paper

The relationship strength is compared to 4 other Facebook graphs (friendship, top-friend, wall, picture) as well as two heuristic graphs: profile-similarity and interaction-count (defined using wall links). The latent variable model outperforms the 6 others graphs on the different classification tasks. Thus, the authors conclude that the latent variable (the relationship strength), is useful at characterizing social networks.