Difference between revisions of "Weng et al WSDM 10"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'This a [[Category::Paper]] reviewed for Social Media Analysis 10-802 in Fall 2012. == Citation == author = {Jianshu Weng and Ee-Peng Lim and …')
 
Line 32: Line 32:
 
2. Whether users with reciprocal "following" relationships are more topically similar than those with it.
 
2. Whether users with reciprocal "following" relationships are more topically similar than those with it.
  
To answer the above questions, topics are extracted from the tweets of the user. Each user document is considered as the list of all the tweets by a user.
+
To answer the above questions, topics are extracted from the tweets of the user. the topics are extracted from the user documents, where a user document is considered as the list of all the tweets by a user.[http://malt.ml.cmu.edu/mw/index.php/Latent_Dirichlet_Allocation Latent Dirichlet Model] is applied to learn the topics in an unsupervised method. The result of applying LDA is represented as -
 +
 
 +
1. <math>DT</math>, a <math>D \times T </math>, where <math>D</math> is the number of twitter users and <math>T</math> is the number of topics.
 +
 
 +
2. <math>WT</math>, a <math>W \times T </math>, where <math>W</math> is the number of unique words and <math>T</math> is the number of topics.
 +
 
 +
3. <math>Z</math> is a <math>1 \times N </math> matrix, where <math>N</math> is the total number of words and <math>Z_{i} </math> is the topic assignment for word <math>w_{ij}</math>
 +
 
 +
The topical difference between users <math>s_{i}<\math> and <math>s_{j}<\math> is calculated as
 +
 
 +
<math>dist(i,j) = \sqrt{2 \ast D_{js}(i,j)<\math>
 +
 
 +
Hypothesis Testing
 +
 
 
The presence of homophily justifies the presence of topical similarities between users motivating the usage of TwitterRank.   
 
The presence of homophily justifies the presence of topical similarities between users motivating the usage of TwitterRank.   
 
== Study Plan ==
 
== Study Plan ==

Revision as of 08:08, 2 October 2012

This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.

Citation

author    = {Jianshu Weng and
              Ee-Peng Lim and
              Jing Jiang and
              Qi He},
 title     = {TwitterRank: finding topic-sensitive influential twitterers},
 booktitle = {WSDM},
 year      = {2010},
 pages     = {261-270},
 ee        = {http://doi.acm.org/10.1145/1718487.1718520},
 crossref  = {DBLP:conf/wsdm/2010},
 bibsource = {DBLP, http://dblp.uni-trier.de}

Online Version

TwitterRank: finding topic-sensitive influential twitterers

Summary

This primary goal of the work is to find influential users on Twitter website. The work proposes TwitterRank, a variation of Pagerank algorithm to measure influence of users in Twitter. TwitterRank takes into consideration the topical similarity between the users along with the link structure to measure influence of followers on users. The motive behind following a user and having mutual followers was studied to verify the presence of homophily in the network. The experimental results show that TwitterRank yields a significantly better result than the baseline techniques.

Dataset

The dataset consists of Singapore-based twitter users. The friends and followers network of top-1000 Singapore users is crawled along with their tweets. The dataset consists of number of tweets |T| = 1,021,039 and number of users |S| = 6748.

Methodology

  • Homophily

In order to verify topical similarity in friendships, two question have been explored.

1. Whether users with "following" relationships are more topically similar than random users.

2. Whether users with reciprocal "following" relationships are more topically similar than those with it.

To answer the above questions, topics are extracted from the tweets of the user. the topics are extracted from the user documents, where a user document is considered as the list of all the tweets by a user.Latent Dirichlet Model is applied to learn the topics in an unsupervised method. The result of applying LDA is represented as -

1. , a , where is the number of twitter users and is the number of topics.

2. , a , where is the number of unique words and is the number of topics.

3. is a matrix, where is the total number of words and is the topic assignment for word

The topical difference between users <math>s_{i}<\math> and <math>s_{j}<\math> is calculated as

<math>dist(i,j) = \sqrt{2 \ast D_{js}(i,j)<\math>

Hypothesis Testing

The presence of homophily justifies the presence of topical similarities between users motivating the usage of TwitterRank.

Study Plan

Homophily

Related Papers

Homophily Paper