Difference between revisions of "Weng et al WSDM 10"
(14 intermediate revisions by the same user not shown) | |||
Line 18: | Line 18: | ||
== Summary == | == Summary == | ||
− | The paper is based on a sub-topic of [[AddressesProblem::Social Influence|social influence]]. The primary goal of the work is to find influential users on [http://twitter.com/ Twitter] website. The work proposes TwitterRank, a variation of Pagerank algorithm to measure influence of users in Twitter. TwitterRank takes into consideration the topical similarity between the users along with the link structure to measure influence of followers on users. | + | The paper is based on a sub-topic of [[AddressesProblem::Social Influence|social influence]]. The primary goal of the work is to find influential users on [http://twitter.com/ Twitter] website. The work proposes TwitterRank, a variation of [[UsesMethod::Pagerank|Pagerank algorithm]] to measure influence of users in Twitter. TwitterRank takes into consideration the topical similarity between the users along with the link structure to measure influence of followers on users. |
The motive behind following a user and having mutual followers was studied and the presence of [http://en.wikipedia.org/wiki/Homophily homophily] in the network was detected. | The motive behind following a user and having mutual followers was studied and the presence of [http://en.wikipedia.org/wiki/Homophily homophily] in the network was detected. | ||
The experimental results on [[UsesDataset::Twitter Dataset For Influence|Twitter dataset]] shows that TwitterRank yields a significantly better performance than the baseline techniques. | The experimental results on [[UsesDataset::Twitter Dataset For Influence|Twitter dataset]] shows that TwitterRank yields a significantly better performance than the baseline techniques. | ||
− | |||
− | |||
− | |||
== Methodology == | == Methodology == | ||
− | + | ===Homophily=== | |
In order to verify topical similarity in friendships, two question have been explored. | In order to verify topical similarity in friendships, two question have been explored. | ||
− | + | Whether users with "following" relationships are more topically similar than random users. | |
− | + | Whether users with reciprocal "following" relationships are more topically similar than those without it. | |
− | To answer the above questions, topics are | + | To answer the above questions, topics are distilled from the twitter text. The topics are extracted from the user documents, where a user document is considered as the list of all the tweets by a user.[[UsesMethod::Latent Dirichlet Allocation | Latent Dirichlet Allocation]] is applied to learn the topics in an unsupervised method. The result of applying LDA is represented as - |
1. <math>DT</math>, a <math>D \times T </math>, where <math>D</math> is the number of twitter users and <math>T</math> is the number of topics. | 1. <math>DT</math>, a <math>D \times T </math>, where <math>D</math> is the number of twitter users and <math>T</math> is the number of topics. | ||
Line 45: | Line 42: | ||
<math>dist(i,j) = \sqrt{2 \ast D_{js}(i,j)} </math> | <math>dist(i,j) = \sqrt{2 \ast D_{js}(i,j)} </math> | ||
− | [ | + | [[UsesMethod:: Hypothesis Testing | Hypothesis Testing]] is used to answer the two questions using the matrix. The positive answers obtained to both the questions justifies the presence of topical similarities between users. This homophily motivates the use of TwitterRank to measure the topic-sensitive influence for users. |
− | + | ||
− | A directed edge is constructed with vertex V as the twitter users, and edge E as the edge from a twitter user to its friend ( whose tweets he follows). A random surfer visits each twitter user with certain probability by following the appropriate edge in <math>D</math>. TwitterRank performs a topic-sensitive random walk, the transition probability from one user to another is based on the topical similarity between the two users. | + | ===TwitterRank - topic-sensitive influence measure=== |
+ | A directed edge is constructed with vertex V as the twitter users, and edge E as the edge from a twitter user to its friend ( whose tweets he follows). A random surfer visits each twitter user with certain probability by following the appropriate edge in <math>D</math>. TwitterRank performs a topic-sensitive random walk, the transition probability from one user to another is based on the topical similarity between the two users. | ||
== Evaluation == | == Evaluation == | ||
Line 72: | Line 70: | ||
The conclusion is that TR outperform InD and PR by a large margin. TR outperforms TSPR as TSPR propagates a twitter user's influence using same transition probability for different topics. | The conclusion is that TR outperform InD and PR by a large margin. TR outperforms TSPR as TSPR propagates a twitter user's influence using same transition probability for different topics. | ||
+ | |||
+ | ==Observations== | ||
+ | * The study detects the presence of homophily in Social networks. The reciprocal relationship between friends suggests that there is a presence of topic similarity between them. | ||
+ | * The paper proposes a simple new methodology to measure the influence of users by considering the topic similarity along with the link structure. | ||
+ | * The paper presents a static structure of the network. A possible and a useful extension, could be the modeling of the dynamics of the twitter network using TwitterRank. | ||
+ | * The other interactions between the user and his friends such as the mentions/reply/retweets could be given more weight than just a mere friendship link. | ||
== Study Plan == | == Study Plan == | ||
− | Some terminology used extensively in the paper | + | *Some terminology used extensively in the paper |
− | + | **[http://en.wikipedia.org/wiki/Homophily Homophily] | |
− | *[http://en.wikipedia.org/wiki/Homophily Homophily] | + | **[http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence Jensen's Distance] |
− | *[http://en.wikipedia.org/wiki/ | + | **[http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient Kendall's correlation] |
− | + | * [http://www-cs-students.stanford.edu/~taherh/papers/topic-sensitive-pagerank.pdf Haveliwala et al] describes the topic-sensitive pagerank algorithm (TSPR) | |
− | *[http:// | + | ** [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.3278 D Rafiei et al(2009)] talks about computing web page reputations. |
− | + | * [http://www.annualreviews.org/doi/abs/10.1146/annurev.soc.27.1.415?journalCode=soc McPherson et al(2001)] Homophily in Social Network. | |
− | *[http:// | + | *[http://tunkrank.com/ Tunk Rank] is similar to Twitter Rank, but does not consider the interaction between the users based on the content. |
− | |||
− | *[http:// | ||
− | |||
− | *[http:// | ||
− | |||
− | |||
− | [http://tunkrank.com/ Tunk Rank] is similar to Twitter Rank, but does not consider the interaction between the users based on the content. | ||
− | + | . |
Latest revision as of 08:09, 4 October 2012
This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.
Contents
Citation
author = {Jianshu Weng and Ee-Peng Lim and Jing Jiang and Qi He}, title = {TwitterRank: finding topic-sensitive influential twitterers}, booktitle = {WSDM}, year = {2010}, pages = {261-270}, ee = {http://doi.acm.org/10.1145/1718487.1718520}, crossref = {DBLP:conf/wsdm/2010}, bibsource = {DBLP, http://dblp.uni-trier.de}
Online Version
TwitterRank: finding topic-sensitive influential twitterers
Summary
The paper is based on a sub-topic of social influence. The primary goal of the work is to find influential users on Twitter website. The work proposes TwitterRank, a variation of Pagerank algorithm to measure influence of users in Twitter. TwitterRank takes into consideration the topical similarity between the users along with the link structure to measure influence of followers on users. The motive behind following a user and having mutual followers was studied and the presence of homophily in the network was detected. The experimental results on Twitter dataset shows that TwitterRank yields a significantly better performance than the baseline techniques.
Methodology
Homophily
In order to verify topical similarity in friendships, two question have been explored.
Whether users with "following" relationships are more topically similar than random users.
Whether users with reciprocal "following" relationships are more topically similar than those without it.
To answer the above questions, topics are distilled from the twitter text. The topics are extracted from the user documents, where a user document is considered as the list of all the tweets by a user.Latent Dirichlet Allocation is applied to learn the topics in an unsupervised method. The result of applying LDA is represented as -
1. , a , where is the number of twitter users and is the number of topics.
2. , a , where is the number of unique words and is the number of topics.
3. is a matrix, where is the total number of words and is the topic assignment for word
The topical difference between users and is calculated as
Hypothesis Testing is used to answer the two questions using the matrix. The positive answers obtained to both the questions justifies the presence of topical similarities between users. This homophily motivates the use of TwitterRank to measure the topic-sensitive influence for users.
TwitterRank - topic-sensitive influence measure
A directed edge is constructed with vertex V as the twitter users, and edge E as the edge from a twitter user to its friend ( whose tweets he follows). A random surfer visits each twitter user with certain probability by following the appropriate edge in . TwitterRank performs a topic-sensitive random walk, the transition probability from one user to another is based on the topical similarity between the two users.
Evaluation
The Twitter Rank algorithm has been compared to the following baselines.
- In-degree (InD) - Measures the influence of Twitter users by the number of followers.
- PageRank (PR) - Measures the influence of Twitter users by making use of only the link structure.
- Topic-sensitive PageRank (TSPR) - Measures the topic-sensitive influence by not considering the topic-sensitive transition probabilities.
- Correlation between the rank lists generated by different algorithms is compared by using Kendall's correlation.
The inference from the table is that TR is most similar to TSPR.
- Performance in recommendation task.
The algorithm for recommendation task is as follows.
The comparison of different algorithms on the recommendation task.
The conclusion is that TR outperform InD and PR by a large margin. TR outperforms TSPR as TSPR propagates a twitter user's influence using same transition probability for different topics.
Observations
- The study detects the presence of homophily in Social networks. The reciprocal relationship between friends suggests that there is a presence of topic similarity between them.
- The paper proposes a simple new methodology to measure the influence of users by considering the topic similarity along with the link structure.
- The paper presents a static structure of the network. A possible and a useful extension, could be the modeling of the dynamics of the twitter network using TwitterRank.
- The other interactions between the user and his friends such as the mentions/reply/retweets could be given more weight than just a mere friendship link.
Study Plan
- Some terminology used extensively in the paper
- Haveliwala et al describes the topic-sensitive pagerank algorithm (TSPR)
- D Rafiei et al(2009) talks about computing web page reputations.
- McPherson et al(2001) Homophily in Social Network.
- Tunk Rank is similar to Twitter Rank, but does not consider the interaction between the users based on the content.
.