Chen et al., CHI 2010

From Cohen Courses
Revision as of 18:06, 3 February 2011 by Dyogatam (talk | contribs) (Created page with '== Citation == Authors : Jilin Chen, Rowan Nairn, Les Nelson, Michael Bernstein, Ed H. Chi Title : Short and Tweet: Experiments on Recommending Content from Information Streams …')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

Authors : Jilin Chen, Rowan Nairn, Les Nelson, Michael Bernstein, Ed H. Chi

Title : Short and Tweet: Experiments on Recommending Content from Information Streams

Conference : CHI 2010

Online version

Paper : [1]
System website : [2]

Summary

The paper describes extensive experiments on URLs recommendation on Twitter to direct user attention in information streams. The task is to recommend interesting URLs to Twitter users. They used social information (followers-followees), content information (text from tweets), and candidate URLs selection to find relevant URLs for a particular user.

The authors claim that the method can be generalized to handle information streams other than Twitter, such as photos or status messages on Facebook, news on Google Reader, etc.

Brief description of the method

They tested 12 algorithms, which can be grouped into three main dimensions (candidate URLs selection, content information, social information).

For candidate URLs selection, they considered two approaches :

  • Selecting URLs posted by followee and followee of followees (FoF)
  • Selecting popular and trending URLs (Popular). They simply pick candidate URLs from either of these pools.

For incorporating content information, they considered three approaches :

  • Not using this information at all (None)
  • Using self-topic (Self-Topic) by computing cosine similarity between tweets mentioning the URLs and user topic vector (obtained using bag-of-words model of user's tweets with tf-idf and normalization to define the weights)
  • Using followee-topic (Followee-Topic) by computing cosine similarity between tweets mentioning the URLs and followees topic vector. For a user u and a followee f, followee-vector of f with respect to u is constructed by picking all words in f's tweets, ranking them by decreasing order of their weights (computed with td-idf and normalization same as above), selecting the top 20% of words, and removing words that none of u’s other followees mention. All u's followee-vectors are then combined to get followees topic vector (of user u).

For incorporating social information, they considered two approaches :

  • Not using this information at all (None)
  • Using the number of times the URLs have been re-tweeted by user's followees and followee of followees (Vote). Specifically, for a user u, the score of a URL is the total vote power of all u ’s followee-of-followees who have mentioned the URL. The vote power of a followee-of-followee f is proportional to the log of the number of u ’s followees who follow f , and to the log of the average time interval between f’s consecutive tweets.

They tried each combination of items in these three groups, resulting in 2x3x2 = 12 algorithms.

Experimental result

They created a wesbite [3] and asked Twitter users to judge the relevancy of URLs produced by each of the 12 methods above. As a result, they have 2640 URLs with their corresponding relevance judgements. They trained a logistic regression which predicts the probability of a URL being relevant using CandidateSet, Ranking-Topic, and Ranking-Social as features.

The best performing method is the one with FoF-SelfTopic-Vote. It recommended 72.09% interesting items, while the baseline (Populer-None-None) only got 32.50%. They observed that the feature which boosted the performance the most is Vote, followed by Self-Topic. This means that ranking the URLs based on user's topic relevance and social network greatly increase the chance of URLs being interesting for the user.

In the paper, they also looked at the effects of interactions between these features to the overall performance of the system.

Datasets used

Not shared.