Difference between revisions of "Chen et al., CHI 2010"
(Created page with '== Citation == Authors : Jilin Chen, Rowan Nairn, Les Nelson, Michael Bernstein, Ed H. Chi Title : Short and Tweet: Experiments on Recommending Content from Information Streams …') |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
== Summary == | == Summary == | ||
− | + | This [[Category::paper]] describes extensive experiments on [[AddressesProblem::content recommendation]] on Twitter to direct user attention in information streams. | |
The task is to recommend interesting URLs to Twitter users. | The task is to recommend interesting URLs to Twitter users. | ||
They used social information (followers-followees), content information (text from tweets), and candidate URLs selection to find relevant URLs for a particular user. | They used social information (followers-followees), content information (text from tweets), and candidate URLs selection to find relevant URLs for a particular user. | ||
Line 41: | Line 41: | ||
They created a wesbite [http://www.zerozero88.com/] and asked Twitter users to judge the relevancy of URLs produced by each of the 12 methods above. | They created a wesbite [http://www.zerozero88.com/] and asked Twitter users to judge the relevancy of URLs produced by each of the 12 methods above. | ||
As a result, they have 2640 URLs with their corresponding relevance judgements. | As a result, they have 2640 URLs with their corresponding relevance judgements. | ||
− | They trained a | + | They trained a [[UsesMethod:: Logistic_regression]] which predicts the probability of a URL being relevant using CandidateSet, Ranking-Topic, and Ranking-Social as features. |
The best performing method is the one with FoF-SelfTopic-Vote. It recommended 72.09% interesting items, while the baseline (Populer-None-None) only got 32.50%. | The best performing method is the one with FoF-SelfTopic-Vote. It recommended 72.09% interesting items, while the baseline (Populer-None-None) only got 32.50%. | ||
Line 49: | Line 49: | ||
In the paper, they also looked at the effects of interactions between these features to the overall performance of the system. | In the paper, they also looked at the effects of interactions between these features to the overall performance of the system. | ||
− | == | + | == Dataset used == |
− | + | Twitter data (not shared). |
Latest revision as of 17:50, 4 February 2011
Contents
Citation
Authors : Jilin Chen, Rowan Nairn, Les Nelson, Michael Bernstein, Ed H. Chi
Title : Short and Tweet: Experiments on Recommending Content from Information Streams
Conference : CHI 2010
Online version
Paper : [1]
System website : [2]
Summary
This paper describes extensive experiments on content recommendation on Twitter to direct user attention in information streams. The task is to recommend interesting URLs to Twitter users. They used social information (followers-followees), content information (text from tweets), and candidate URLs selection to find relevant URLs for a particular user.
The authors claim that the method can be generalized to handle information streams other than Twitter, such as photos or status messages on Facebook, news on Google Reader, etc.
Brief description of the method
They tested 12 algorithms, which can be grouped into three main dimensions (candidate URLs selection, content information, social information).
For candidate URLs selection, they considered two approaches :
- Selecting URLs posted by followee and followee of followees (FoF)
- Selecting popular and trending URLs (Popular). They simply pick candidate URLs from either of these pools.
For incorporating content information, they considered three approaches :
- Not using this information at all (None)
- Using self-topic (Self-Topic) by computing cosine similarity between tweets mentioning the URLs and user topic vector (obtained using bag-of-words model of user's tweets with tf-idf and normalization to define the weights)
- Using followee-topic (Followee-Topic) by computing cosine similarity between tweets mentioning the URLs and followees topic vector. For a user u and a followee f, followee-vector of f with respect to u is constructed by picking all words in f's tweets, ranking them by decreasing order of their weights (computed with td-idf and normalization same as above), selecting the top 20% of words, and removing words that none of u’s other followees mention. All u's followee-vectors are then combined to get followees topic vector (of user u).
For incorporating social information, they considered two approaches :
- Not using this information at all (None)
- Using the number of times the URLs have been re-tweeted by user's followees and followee of followees (Vote). Specifically, for a user u, the score of a URL is the total vote power of all u ’s followee-of-followees who have mentioned the URL. The vote power of a followee-of-followee f is proportional to the log of the number of u ’s followees who follow f , and to the log of the average time interval between f’s consecutive tweets.
They tried each combination of items in these three groups, resulting in 2x3x2 = 12 algorithms.
Experimental result
They created a wesbite [3] and asked Twitter users to judge the relevancy of URLs produced by each of the 12 methods above. As a result, they have 2640 URLs with their corresponding relevance judgements. They trained a Logistic_regression which predicts the probability of a URL being relevant using CandidateSet, Ranking-Topic, and Ranking-Social as features.
The best performing method is the one with FoF-SelfTopic-Vote. It recommended 72.09% interesting items, while the baseline (Populer-None-None) only got 32.50%. They observed that the feature which boosted the performance the most is Vote, followed by Self-Topic. This means that ranking the URLs based on user's topic relevance and social network greatly increase the chance of URLs being interesting for the user.
In the paper, they also looked at the effects of interactions between these features to the overall performance of the system.
Dataset used
Twitter data (not shared).