Difference between revisions of "Konstas et al. SIGIR 2009"

From Cohen Courses
Jump to navigationJump to search
Line 9: Line 9:
 
== Summary ==
 
== Summary ==
  
This [[Category::paper]] uses [[UsesMethod::Random walk with restart]] as a method of [[AddressesProblem::Content recommendation]] in social network systems.  
+
This [[Category::paper]] uses [[UsesMethod::Random walk with restart]] as a method of [[AddressesProblem::Content recommendation|recommendation]] in social network systems.  
  
 
As it is in the context of social network, naturally and effectively, a graph based algorithm such as [[UsesMethod::Random walk with restart]] can be considered to perform the [[AddressesProblem::Content recommendation|recommendation]] task.  
 
As it is in the context of social network, naturally and effectively, a graph based algorithm such as [[UsesMethod::Random walk with restart]] can be considered to perform the [[AddressesProblem::Content recommendation|recommendation]] task.  

Revision as of 03:34, 2 October 2012

Citation

Ioannis Konstas, Vassilios Stathopoulos, and Joemon M. Jose. On social networks and collaborative recommendation. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’09, pages 195–202, New York, NY, USA, 2009. ACM.

Online Version

online version

Summary

This paper uses Random walk with restart as a method of recommendation in social network systems.

As it is in the context of social network, naturally and effectively, a graph based algorithm such as Random walk with restart can be considered to perform the recommendation task.

This paper argues that the extra knowledge provided by the users' social activity, such as the social annotation and friendships inherent in the social graph established among users, items and tags, can improve the performance of a recommendation system using methods such as Random walk with restart.

The argue is supported based on a series of comparison experiments between the Random walk with restart model and a user-based collaborative filtering method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard collaborative filtering method.

Background and preparation

  • Network construction

The dataset WordNet is used to construct the network of words.Collect all words in WordNet, and add links between any two words that occurr in the same synset. The resulting graph is a graph where is a set of word / part-of-speech pairs for all the words in WordNet. is the set of edges connecting each pair of synonymous words.

  • Random walk model

Starting from a word with unknown polarity , it moves to a node with probability after the first step. The walk continues until the surfer hits a word with a known polarity.

  • First-passage time

It is very similar to the definition of hitting time. The mean first-passage (hitting) time is defined as the average number of steps a random walker, starting in state , will take to enter state for the first time. Considering a subset vertices of the graph, then means the average number of steps a random walker, starting in state , will take to enter a stae for the first time.

Then it is proven that:

Algorithm description

  1. Construct a word relatedness graph
  2. Define a random walk on the graph
  3. Compute the word's hitting time for both the positive and negative sets of vertices
  4. If the hitting time for the positive set is greater than for the negative set, than the word is classified as negative. Otherwise, it is classified as positive. The ratio between the two hitting times could be used as an indication of how positive/negative the given word is.

Since computing the hitting time is time consuming especially when the graph is large, a Monte Carlo based estimating algorithm is proposed as such: Word polarity using random walks.png

Experiment result

Comparing to other methods, this method is quite successful in both the settings of semi-supervised and unsupervised.

  • In the setting of using WordNet synonyms and hypernyms to construct the network and test set to the set of adjectives. It out performs the spin-model, bootstrap and short-path method.

Hassan & Radev ACL 2010 E1.png

  • It is also compared to the SO-PMI method in the setting of only 14 seeds. Though SO-PMI with a very large dataset performs slightly better than this method, this method is faster and does not need such large corpus.

Hassan & Radev ACL 2010 E2.png

Related papers

Study plan