Weng et al WSDM 10

From Cohen Courses
Revision as of 09:33, 2 October 2012 by Anikag (talk | contribs)
Jump to navigationJump to search

This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.

Citation

author    = {Jianshu Weng and
              Ee-Peng Lim and
              Jing Jiang and
              Qi He},
 title     = {TwitterRank: finding topic-sensitive influential twitterers},
 booktitle = {WSDM},
 year      = {2010},
 pages     = {261-270},
 ee        = {http://doi.acm.org/10.1145/1718487.1718520},
 crossref  = {DBLP:conf/wsdm/2010},
 bibsource = {DBLP, http://dblp.uni-trier.de}

Online Version

TwitterRank: finding topic-sensitive influential twitterers

Summary

This primary goal of the work is to find influential users on Twitter website. The work proposes TwitterRank, a variation of Pagerank algorithm to measure influence of users in Twitter. TwitterRank takes into consideration the topical similarity between the users along with the link structure to measure influence of followers on users. The motive behind following a user and having mutual followers was studied to verify the presence of homophily in the network. The experimental results shows that TwitterRank yields a significantly better result than the baseline techniques.

Dataset

The dataset consists of Singapore-based twitter users in the year 2009. The friends and followers network of top-1000 Singapore users is crawled along with their tweets. The dataset consists of number of tweets |T| = 1,021,039 and number of users |S| = 6748.

Methodology

  • Homophily

In order to verify topical similarity in friendships, two question have been explored.

1. Whether users with "following" relationships are more topically similar than random users.

2. Whether users with reciprocal "following" relationships are more topically similar than those with it.

To answer the above questions, topics are extracted from the tweets of the user. the topics are extracted from the user documents, where a user document is considered as the list of all the tweets by a user.Latent Dirichlet Model is applied to learn the topics in an unsupervised method. The result of applying LDA is represented as -

1. , a , where is the number of twitter users and is the number of topics.

2. , a , where is the number of unique words and is the number of topics.

3. is a matrix, where is the total number of words and is the topic assignment for word

The topical difference between users and is calculated as

Hypothesis Testing is used to answer the two questions using the matrix. the positive answers to both the questions justifies the presence of topical similarities between users. This homophily motivates the use of TwitterRank to measure the topic-sensitive influence for users.

  • TwitterRank - topic-sensitive influence measure

A directed edge is constructed with vertex V as the twitter users, and edge E as the edge from a twitter user to its friend ( whose tweets he follows). A random surfer visits each twitter user with certain probability by following the appropriate edge in . TwitterRank performs a topic-sensitive random walk, the transition probability from one user to another is based on the topical similarity between the two users.

Evaluation

The Twitter Rank algorithm has been compared to the following baselines.

  • In-degree (InD) - Measures the influence of Twitter users by the number of followers.
  • PageRank (PR) - Measures the influence of Twitter users by making use of only the link structure.
  • Topic-sensitive PageRank (TSPR) - Measures the topic-sensitive influence by not considering the topic-sensitive transition probabilities.
  • Correlation between the rank lists generated by different algorithms is compared by using Kendall's correlation.

Corr.png

The inference from the table is that TR is most similar to TSPR.

  • Performance in recommendation task.

The algorithm for recommendation task is as follows.

Algorithm.png

The comparison of different algorithms on the recommendation task.

Recommendation.png

The conclusion is that TR outperform InD and PR by a large margin. TR outperforms TSPR as TSPR propagates a twitter user's influence using same transition probability for different topics.

Study Plan

Some terminology used extensively in the paper

Related Papers

Tunk Rank is similar to Twitter Rank, but does not consider the interaction between the users based on the content.

Haveliwala et al describes the topic-sensitive pagerank algorithm (TSPR).