Difference between revisions of "Konstas et al. SIGIR 2009"
Line 19: | Line 19: | ||
== The dataset == | == The dataset == | ||
− | The source of [[last.fm|this dataset]] comes from last.fm - a Web 2.0 website. As introduced in the [[last.fm|dataset]], there are three components serving as the vertices in the graph - users, music tracks and tags, deriving relation matrices as User-Track(UTr), User-Tag(UTg) and Track-Tag(TrTg). | + | The source of [[UsesDataset:last.fm|this dataset]] comes from last.fm - a Web 2.0 website. As introduced in the [[last.fm|dataset]], there are three components serving as the vertices in the graph - users, music tracks and tags, deriving relation matrices as User-Track(UTr), User-Tag(UTg) and Track-Tag(TrTg). |
The UTr sub-matrix consists of the playcount of every track by each user, i.e. the number of times it has been listened to. | The UTr sub-matrix consists of the playcount of every track by each user, i.e. the number of times it has been listened to. |
Revision as of 03:08, 2 October 2012
Contents
Citation
Ioannis Konstas, Vassilios Stathopoulos, and Joemon M. Jose. On social networks and collaborative recommendation. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’09, pages 195–202, New York, NY, USA, 2009. ACM.
Online Version
Summary
This paper uses Random walk with restart as a method of content recommendation in social network systems.
As it is in the context of social network, naturally and effectively, a graph based algorithm such as Random walk with restart can be considered to perform the recommendation task.
This paper argues that the extra knowledge provided by the users' social activity, such as the social annotation and friendships inherent in the social graph established among users, items and tags, can improve the performance of a recommendation system using methods such as Random walk with restart.
The argue is supported based on a series of comparison experiments between the Random walk with restart model and a user-based collaborative filtering method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard collaborative filtering method.
The dataset
The source of this dataset comes from last.fm - a Web 2.0 website. As introduced in the dataset, there are three components serving as the vertices in the graph - users, music tracks and tags, deriving relation matrices as User-Track(UTr), User-Tag(UTg) and Track-Tag(TrTg).
The UTr sub-matrix consists of the playcount of every track by each user, i.e. the number of times it has been listened to.
The UU sub-matrix consists the average user playcount. This is crucial because the UTr sub-matrix may contain very large values and thus the binary values in UU will be suppressed.
In the case of the UTg sub-matrix, the associated tags for each user have been collected based on popularity. we applied an exponential decay function to the values of the tags of each user in the matrix, where the most popular tag gets the average user's playcount. The same process was performed for the TrTg sub-matrix accordingly.
The full social graph look like:
Algorithm description
- Construct a word relatedness graph
- Define a random walk on the graph
- Compute the word's hitting time for both the positive and negative sets of vertices
- If the hitting time for the positive set is greater than for the negative set, than the word is classified as negative. Otherwise, it is classified as positive. The ratio between the two hitting times could be used as an indication of how positive/negative the given word is.
Since computing the hitting time is time consuming especially when the graph is large, a Monte Carlo based estimating algorithm is proposed as such:
Experiment result
Comparing to other methods, this method is quite successful in both the settings of semi-supervised and unsupervised.
- In the setting of using WordNet synonyms and hypernyms to construct the network and test set to the set of adjectives. It out performs the spin-model, bootstrap and short-path method.
- It is also compared to the SO-PMI method in the setting of only 14 seeds. Though SO-PMI with a very large dataset performs slightly better than this method, this method is faster and does not need such large corpus.
Related papers
- Hiroya Takamura, Takashi Inui, and Manabu Okumura.2005. Extracting semantic orientations of words using spin model. In ACL ’05.
- Peter Turney and Michael Littman. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems, 21:315–346.
- Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maarten De Rijke. 2004. Using wordnet to measure semantic orientations of adjectives. In National Institute for, pages 1115–1118.
- Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In KDD ’04
- Peter D. Turney. 2002. Thumbs up or thumbs down?:semantic orientation applied to unsupervised classification of reviews. In ACL ’02
Study plan
- Article: Random walk
- Article: Monto Carlo Method