Active Learning in Link Prediction for social networks

From Cohen Courses
Revision as of 02:44, 9 October 2012 by Zhua (talk | contribs) (→‎Basic Idea)
Jump to navigationJump to search

Teem members

Troy Hua And I am looking for teammates :-)

Basic Idea

Link prediction or friend suggestion is a common feature that most online social network sites have provided.

In a real scenario, the process is not static, that users keep adding friends by themselves or by confirming to the friend suggestions from the system. All the users feedback to the suggestion list is crucial and informative to generate a more accurate suggestion list. More over, we can look at this from an active learning model's perspective. The system is trying to learn a model to predict friends for a user and is able to query the user by giving suggestions and learn from the feedback.

Advantage of using active learning in link prediction

  • It just matches how user is interacting with the system in online social network
  • In a cold start situation, e.g. a new social network site, a choice of informative suggestions and a clever learning from the feedback will quick increase the prediction accuracy even with small number of known friends, which plays a crucial role for a new SNS to grow up and become popular rapidly.
  • It will generate a more diversified suggestion list. If a user moves into a new environment, she/he will be more eager to establish relationship with people in the new environment. A traditional learning algorithm will not give a high score to people in the new environment due to lack of information. However, active learning will possibly do if we a good design on how to make informative suggestions.
  • Active learning is able to adopt other good methods for link prediction as its interior part.

Our Goal

We want to use fewer labeled data (known relation between users) to reach a higher precision and to derive a more natural user scenarios for friend suggestions.

Possible Methods and something we can do

when deriving a detailed learning framework, we have to answer all the questions below:

  • Q: How to choose informative suggestions and what does "informative" mean here?
A: There are couple of ways in active learning.
e.g. To choose the one that will change the model the most if we know the label or to choose the unlabeled data which have different predictions from different hypothesis. 
  • Q: How to understand user feedback?
A: One simple way is just to add the feedback to the current labeled data, or we can add more weight in this data point if it gets a positive feedback and we should   think about how to use it wisely 
when we get a negative feedback.
  • Q: How to "get" feedback information from the data we have? (We actually do not have a user to query.)
A: A way to solve this is to use a subset of training data and use the rest as oracles. It can be a problem that different user will have different number of friends. So I am thinking to add a learned
threshold for each user to bias to suggestion. 
  • Q: What is the general learning method here?
A: As far as I know, supervised random walk is kind of state of the art and well known method for general link prediction on social networks. It can be used as our interior learning method as well as 
a competitive baseline.

Data sets

Stanford data sets and a lot of social network data from datamob, which has a 2009 facebook social graph data. And I also have a protein-protein interaction network dataset. Although it is not social network, it shares some important facts like small-world phenomenon and a power low distribution of node degree and modularity. It can be used to test on a cold start situation as the average degree is only 4 to 5.

Evaluation Metrics

An "unfair" evaluation is to compare the accuracy with other learning methods which have their training data randomly selected. Although I think it is a little bit "unfair", some papers in active learning actually do their evaluation in this way as they are the first to use active learning in a certain data set or a certain problem. So one way to evaluate is to compare the accuracy with supervised random walk. And if it is possible, I hope we can involve real users to interact with the system and provide reviews on the suggestion list.

Related Works