Difference between revisions of "Active Learning in Link Prediction for social networks"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
This is just a draft version for the project idea and it is in a partner hunting status.
+
== Teem members ==
 +
[[User:Zhua|Troy Hua]]
 +
And I am looking for teammates :-)
 +
 
 +
== Basic Idea ==
 +
Link prediction or friend suggestion is a common feature that most online social network sites have provided.  
  
== Members ==
+
In a real scenario, the process is not static, that users keep adding friends by themselves or by confirming to the friend suggestions from the system. All the users feedback to the suggestion list is crucial and informative to generate a more accurate suggestion list. More over, we can look at this from an active learning model's prospective. The system is trying to learn a model to predict friends for a user and is able to query the user by giving suggestions and learn from the feedback.
[[User:Zhua|Troy Hua]]
+
 
 +
== Advantage of using  active learning in link prediction ==
 +
*It just matches how user is interacting with the system in online social network
 +
*In a cold start situation, e.g. a new social network site, a choice of informative suggestions and a clever learning from the feedback will quick increase the prediction accuracy even with small number of known friends, which plays a crucial role for a new SNS to grow up and become popular rapidly.
 +
*It will generate a more diversified suggestion list. If a user moves into a new environment, she/he will be more eager to establish relationship with people in the new environment. A traditional learning algorithm will not give a high score to people in the new environment due to lack of information. However, active learning will possibly do if we a good design on how to make informative suggestions.
 +
*Active learning is able to adopt other good methods for link prediction as its interior part.
  
And I am looking for teammates :-)
+
== Our Goal ==
 +
We want to use fewer labeled data (known relation between users) to reach a higher precision and to derive a more natural user scenarios for friend suggestions.
  
== Idea ==
+
== Possible Methods and something we can do ==
In most online social networks, link prediction often works as a friend suggestion tool that users make confirmation to a list of friend suggestions from the system. The feedback of users can be very informative as it sometimes tells the bias of this user, especially temporal bias or interest. And this scenario is similar to an active learning framework that the system can query the user by providing a suggestion and learn from the feedback. We can also see this feedback in a learning-to-rank model that users' confirmation on the suggestions as a click event on the search results. Active learning in link prediction intuitively also should work well in a cold start situation due to lack of information and active learning can quickly converge and learn more about the user from informative suggestions feedback.
+
when deriving a detailed learning framework, we have to answer all the questions below:
 +
*Q: How to choose informative suggestions and what does "informative" mean here?
 +
A: There are couple of ways in active learning, like choosing the one that will change the model the most if we know the label or to choose the unlabeled data which have different predictions from different hypothesis.
 +
*Q: How to understand user feedback?
 +
A: One simple way is just to add the feedback to the current labeled data, or we can add more weight in this data point if it gets a positive feedback and we should think about how to use it wisely when we get a negative feedback.
 +
*Q: How to "get" feedback information from the data we have? (We actually do not have a user to query.)
 +
A: A way to solve this is to use a subset of training data and use the rest as oracles. It can be a problem that different user will have different number of friends. So I am thinking to add a learned threshold for each user to bias to suggestion.
 +
*Q: What is the general learning method here?
 +
A: As far as I know, supervised random walk is kind of state of the art and well known method for general link prediction on social networks. It can be used as our interior learning method as well as a competitive baseline.
  
 
== Data sets ==
 
== Data sets ==
Line 13: Line 32:
  
 
== Evaluation Metrics ==
 
== Evaluation Metrics ==
Evaluation is an interesting topic in social network link prediction. One common way is to divide the graph by different timestamps and train on an early graph and test the prediction on a later one. This sounds natural, however, sometimes and especially in a new social network, the sequence of adding friends is just based on the how the system gives the suggestion rather than some real features we can capture. And we should have a more suitable way to deal with our active learning framework.
+
An "unfair" evaluation is to compare the accuracy with other learning methods which have their training data randomly selected. Although I think it is a little bit "unfair", but some papers in active learning actually do their evaluation in this way as they are the first to use active learning in a certain data set or a certain problem. So one way to evaluate is to compare the accuracy with supervised random walk. And if it is possible, I hope we can involve real users to interact with the system and provide reviews on the suggestion list.
 +
 
 +
== Related Works ==
 +
* [[Supervised Random Walk]]
 +
* [[Link propagation: A fast semi-supervised learning algorithm for link prediction]]
 +
* [[Active learning literature survey]]

Revision as of 02:37, 9 October 2012

Teem members

Troy Hua And I am looking for teammates :-)

Basic Idea

Link prediction or friend suggestion is a common feature that most online social network sites have provided.

In a real scenario, the process is not static, that users keep adding friends by themselves or by confirming to the friend suggestions from the system. All the users feedback to the suggestion list is crucial and informative to generate a more accurate suggestion list. More over, we can look at this from an active learning model's prospective. The system is trying to learn a model to predict friends for a user and is able to query the user by giving suggestions and learn from the feedback.

Advantage of using active learning in link prediction

  • It just matches how user is interacting with the system in online social network
  • In a cold start situation, e.g. a new social network site, a choice of informative suggestions and a clever learning from the feedback will quick increase the prediction accuracy even with small number of known friends, which plays a crucial role for a new SNS to grow up and become popular rapidly.
  • It will generate a more diversified suggestion list. If a user moves into a new environment, she/he will be more eager to establish relationship with people in the new environment. A traditional learning algorithm will not give a high score to people in the new environment due to lack of information. However, active learning will possibly do if we a good design on how to make informative suggestions.
  • Active learning is able to adopt other good methods for link prediction as its interior part.

Our Goal

We want to use fewer labeled data (known relation between users) to reach a higher precision and to derive a more natural user scenarios for friend suggestions.

Possible Methods and something we can do

when deriving a detailed learning framework, we have to answer all the questions below:

  • Q: How to choose informative suggestions and what does "informative" mean here?
A: There are couple of ways in active learning, like choosing the one that will change the model the most if we know the label or to choose the unlabeled data which have different predictions from different hypothesis. 
  • Q: How to understand user feedback?
A: One simple way is just to add the feedback to the current labeled data, or we can add more weight in this data point if it gets a positive feedback and we should think about how to use it wisely when we get a negative feedback.
  • Q: How to "get" feedback information from the data we have? (We actually do not have a user to query.)
A: A way to solve this is to use a subset of training data and use the rest as oracles. It can be a problem that different user will have different number of friends. So I am thinking to add a learned threshold for each user to bias to suggestion. 
  • Q: What is the general learning method here?
A: As far as I know, supervised random walk is kind of state of the art and well known method for general link prediction on social networks. It can be used as our interior learning method as well as a competitive baseline.

Data sets

Stanford data sets and a lot of social network data from datamob, which has a 2009 facebook social graph data. And I also have a protein-protein interaction network dataset. Although it is not social network, it shares some important facts like small-world phenomenon and a power low distribution of node degree and modularity. It can be used to test on a cold start situation as the average degree is only 4 to 5.

Evaluation Metrics

An "unfair" evaluation is to compare the accuracy with other learning methods which have their training data randomly selected. Although I think it is a little bit "unfair", but some papers in active learning actually do their evaluation in this way as they are the first to use active learning in a certain data set or a certain problem. So one way to evaluate is to compare the accuracy with supervised random walk. And if it is possible, I hope we can involve real users to interact with the system and provide reviews on the suggestion list.

Related Works