Active Learning in Link Prediction for social networks

From Cohen Courses
Revision as of 17:09, 10 October 2012 by Zhua (talk | contribs) (→‎Teem members)
Jump to navigationJump to search

Comments

This is a nice idea. One worry I have is that most link datasets are not complete - they are observed interactions, but non-observed interactions might also be just as plausible as the ones actually observed. It's probably hard to find a dataset with actual users you can query.

I see a couple of ways to get around this. One option would be to consider active learning for link classification - there are a few signed networks available, for instance. Lise Getoor has also done some work on link classification, I believe. Another would be to look at link datasets that include time, and explore a variant of active learning that looks ahead into the future, instead of asking users. Eg, you'd do active learning to modify a classifier at time T by using labels from time T+1 as oracles, and then use this to classify at time T+2. Or perhaps you could consider two disjoint networks, and learn some sort of link-prediction strategy on one by looking ahead into the future for the second.

There are a lot of interesting issues that arise when you start asking - what is the impact of making suggestions to a user (as most networks now do)? I don't know if there's any easy way to study this, or get insight into it, from passive time-stamped data, but it's worth thinking about. --Wcohen 19:02, 10 October 2012 (UTC)

Teem members

Basic Idea

Link prediction or friend suggestion is a common feature that most online social network sites have provided.

In a real scenario, the process is not static, that users keep adding friends by themselves or by confirming to the friend suggestions from the system. All the users feedback to the suggestion list is crucial and informative to generate a more accurate suggestion list. More over, we can look at this from an active learning model's perspective. The system is trying to learn a model to predict friends for a user and is able to query the user by giving suggestions and learn from the feedback.

Advantage of using active learning in link prediction

  • It just matches how user is interacting with the system in online social network
  • In a cold start situation, e.g. a new social network site, a choice of informative suggestions and a clever learning from the feedback will quick increase the prediction accuracy even with small number of known friends, which plays a crucial role for a new SNS to grow up and become popular rapidly.
  • It will generate a more diversified suggestion list. If a user moves into a new environment, she/he will be more eager to establish relationship with people in the new environment. A traditional learning algorithm will not give a high score to people in the new environment due to lack of information. However, active learning will possibly do if we a good design on how to make informative suggestions.
  • Active learning is able to adopt other good methods for link prediction as its interior part.

Our Goal

We want to use fewer labeled data (known relation between users) to reach a higher precision and to derive a more natural user scenarios for friend suggestions.

Possible Methods and something we can do

when deriving a detailed learning framework, we have to answer all the questions below:

  • Q: How to choose informative suggestions and what does "informative" mean here?
A: There are couple of ways in active learning.
e.g. To choose the one that will change the model the most if we know the label or to choose the unlabeled data which have different predictions from different hypothesis. 
  • Q: How to understand user feedback?
A: One simple way is just to add the feedback to the current labeled data, or we can add more weight in this data point if it gets a positive feedback and we should   think about how to use it wisely 
when we get a negative feedback.
  • Q: How to "get" feedback information from the data we have? (We actually do not have a user to query.)
A: A way to solve this is to use a subset of training data and use the rest as oracles. It can be a problem that different user will have different number of friends. So I am thinking to add a learned
threshold for each user to bias to suggestion. 
  • Q: What is the general learning method here?
A: As far as I know, supervised random walk is kind of state of the art and well known method for general link prediction on social networks. It can be used as our interior learning method as well as 
a competitive baseline.

Data sets

Stanford data sets and a lot of social network data from datamob, which has a 2009 facebook social graph data. And I also have a protein-protein interaction network dataset. Although it is not social network, it shares some important facts like small-world phenomenon and a power low distribution of node degree and modularity. It can be used to test on a cold start situation as the average degree is only 4 to 5.

Evaluation Metrics

An "unfair" evaluation is to compare the accuracy with other learning methods which have their training data randomly selected. Although I think it is a little bit "unfair", some papers in active learning actually do their evaluation in this way as they are the first to use active learning in a certain data set or a certain problem. So one way to evaluate is to compare the accuracy with supervised random walk. And if it is possible, I hope we can involve real users to interact with the system and provide reviews on the suggestion list.

Related Works