Difference between revisions of "Sun et. al., ICDM 2005"

From Cohen Courses
Jump to navigationJump to search
Line 3: Line 3:
 
== Citation ==
 
== Citation ==
  
Neighborhood Formation and Anomaly Detection in Bipartite Graphs,
+
Measuring user influence in twitter: The million follower fallacy,
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, Christos Faloutsos, ICDM 2005
+
Cha, M. and Haddadi, H. and Benevenuto, F. and Gummadi, K.P., ICWSM 2010
  
 
== Online version ==
 
== Online version ==
  
[http://www.cs.cmu.edu/~deepay/mywww/papers/icdm05.pdf Neighborhood Formation and Anomaly Detection in Bipartite Graphs]
+
[http://snap.stanford.edu/class/cs224w-readings/cha10influence.pdf Measuring User Influence in Twitter: The Million Follower Fallacy
 +
]
  
 
== Summary ==
 
== Summary ==
  
This paper poses two interesting social problems on [[bipartite graph]] named [[AddressesProblem::Neighborhood formation and Anomaly detection]]. They also propose solutions based on [[UsesMethod::Random walk with restart]].  
+
This paper characterize  influence of a user in social media. A more influential user causes others who are connected with him to act in certain way. They first define three different measures for influence. Then they investigate whether the influence of a user hold across all topics?
 +
 
 +
Twitter data collected for 54M users is used for experiment. Total number of follow links present in the dataset is 1,963,263,821. Dataset has 1,755,925,520 number of tweets.
 +
 
 +
Three different measures of influence used are:
 +
 
 +
1. Indegree: Total number of followers. Essentially it computes how many people at most read tweets of a particular twitter.
 +
2. Mention:  Number of mentions for a twitter. This reflects how many people has listened to a tweeter and mentioned him/her in their tweets.
 +
3. Retweets: Number of retweets for a specific user.
 +
 
 +
 
 +
 
 +
 
 +
== Findings ==
 +
Influence computed using above measures is then compared using Spearman’s rank correlation[http://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient]
 +
 
 +
Correlation between measures is as following:
 +
Indegree vs retweets: .549
 +
Indegree vs mentions: .638
 +
Retweets vs mentions: .580
 +
 
 +
This shows that there is stronger correlation between Indegree and mentions.
 +
 
 +
To compute influence for a particular topic, they choose tweets related to three topics. These topics are the Iranian presidential election, the outbreak of the H1N1 influenza, and the death of Michael Jackson. Among the users who tweeted about these topics, there were 13,219 users who tweeted about all three topics.
 +
 
 +
Table: Spearman’s rank correlation coefficients over topics
 +
Retweet Mentions
 +
Topics Top 10% 1% Top 10% 1%
 +
Iran vs. Swine 0.54 0.62 0.59 0.68
 +
Iran vs. Jackson 0.48 0.54 0.59 0.63
 +
Swine vs. Jackson 0.55 0.50 0.80 0.68
 +
 
 +
 
 +
 
 +
 
  
The experimented on 3 real world social graphs [[UsesDataset::Conference-Author dataset]], [[UsesDataset::Author-Paper dataset]] and [[UsesDataset::IMDB dataset]]
 
  
== Evaluation ==
 
  
They evaluate their methods by asking following 4 questions :
 
  - Does NF find out meaningful neighborhoods?
 
  - How close is Approximate NF to exact NF?
 
  - Can AD detect injected anomalies?
 
  - How much time these methods take to run on graphs of varying sizes?
 
  
 
== Discussion ==
 
== Discussion ==

Revision as of 22:46, 26 September 2012

This a Paper discussed in Social Media Analysis 10-802 in Spring 2010.

Citation

Measuring user influence in twitter: The million follower fallacy, Cha, M. and Haddadi, H. and Benevenuto, F. and Gummadi, K.P., ICWSM 2010

Online version

[http://snap.stanford.edu/class/cs224w-readings/cha10influence.pdf Measuring User Influence in Twitter: The Million Follower Fallacy ]

Summary

This paper characterize influence of a user in social media. A more influential user causes others who are connected with him to act in certain way. They first define three different measures for influence. Then they investigate whether the influence of a user hold across all topics?

Twitter data collected for 54M users is used for experiment. Total number of follow links present in the dataset is 1,963,263,821. Dataset has 1,755,925,520 number of tweets.

Three different measures of influence used are:

1. Indegree: Total number of followers. Essentially it computes how many people at most read tweets of a particular twitter. 2. Mention: Number of mentions for a twitter. This reflects how many people has listened to a tweeter and mentioned him/her in their tweets. 3. Retweets: Number of retweets for a specific user.



Findings

Influence computed using above measures is then compared using Spearman’s rank correlation[1]

Correlation between measures is as following: Indegree vs retweets: .549 Indegree vs mentions: .638 Retweets vs mentions: .580

This shows that there is stronger correlation between Indegree and mentions.

To compute influence for a particular topic, they choose tweets related to three topics. These topics are the Iranian presidential election, the outbreak of the H1N1 influenza, and the death of Michael Jackson. Among the users who tweeted about these topics, there were 13,219 users who tweeted about all three topics.

Table: Spearman’s rank correlation coefficients over topics Retweet Mentions Topics Top 10% 1% Top 10% 1% Iran vs. Swine 0.54 0.62 0.59 0.68 Iran vs. Jackson 0.48 0.54 0.59 0.63 Swine vs. Jackson 0.55 0.50 0.80 0.68





Discussion

This paper poses two important social problems related to bipartite social graphs and explained how those problems can be solved efficiently using random walks.

They also claim that the neighborhoods over nodes can represent personalized clusters depending on different perspectives.

During presentation one of the audiences raised question about is anomaly detection in this paper similar to betweenness of edges defined in Kleinber's text as discussed in Class Meeting for 10-802 01/26/2010. I think they are similar. In the texbook they propose, detecting edges with high betweenness and using them to partition the graph. In this paper they first try to create neighbourhood partitions based on random walk prbabilities and which as a by product gives us nodes and edges with high betweenness value.

Related papers

There has been a lot of work on anomaly detection in graphs.

  • The paper by Moonesinghe and Tan ICTAI06 finds the clusters of outlier objects by doing random walk on the weighted graph.
  • The paper by Aggarwal SIGMOD 2001 proposes techniques for projecting high dimensional data on lower dimensions to detect outliers.

Study plan

  • Article:Bipartite graph:[2]
  • Article:Anomaly detection:[3]
  • Paper:Topic sensitive pagerank:[4]
    • Paper:The PageRank Citation Ranking: Bringing Order to the Web:[5]
  • Paper:Multilevel k-way Partitioning Scheme for Irregular Graphs:[6]