Difference between revisions of "Compare Ramage Naaman"

From Cohen Courses
Jump to navigationJump to search
(Created page with '==Two Papers== 1 [http://www.stanford.edu/~dramage/papers/twitter-icwsm10.pdf Ramage et al ICWSM 2010] 2 [http://dl.acm.org/citation.cfm?id=1718953 Naaman et al 2010] == Proble…')
 
 
(5 intermediate revisions by the same user not shown)
Line 5: Line 5:
  
 
== Problem ==
 
== Problem ==
In the ICDM 2006 paper, the authors defined a problem of finding the "latent friend", which is "people who share the similar topic distribution in their blogs. These people may or may not actually know each other, but they have the interest and potential to find each other out"
+
In the Ramage paper, the authors claims that the topics of all the tweets in Twitter can be classified into four categories:
 +
- Substance Topics about events and ideas
 +
- Social Topics recognizing language used toward a social end
 +
- Status Topics denoting personal update
 +
- Style Topic that contains broader trends in language usage
  
In the ICWSM 2009 paper, the authors defined a problem of finding the "familiar stranger", which is "individuals who are not directly connected but exhibit some similarity".
+
They used the LDA topic model to model the latent topic information in each tweet.  
  
From the definition of the two problems, it's very similar. However, they focused on totally different challenges. In the ICDM paper, the major challenge that the authors were dealing with is how to measure similarity between two users, while the ICWSM paper was mainly dealing with how to narrow down the search space. As a result, even both of the papers were trying to solve similar problem, but they focused on totally different perspective.
+
In the Naaman paper, the authors also categorized the tweets from their underlying social meanings. However, they came up with 9 categories instead of 4:
 +
- Information Sharing (IS)
 +
- Self Promotion (SP)
 +
- Opinions/Complaints (OC)
 +
- Statements and Random Thoughts (RT)
 +
- Me now (ME): talking about the user's own feelings
 +
- Question to followers (QF)
 +
- Presence Maintenance (PM)
 +
- Anecdote (me) (AM)
 +
- Anecdote (other) (AO)
 +
 
 +
Although the two categorizing methods are quite different, they both look into the latent meanings of the tweets and try to find the connection between the tweet's words and its social meaning.
  
 
== Algorithm ==
 
== Algorithm ==
Because the two papers focused on totally different challenges, the methods used in those two papers are not comparable. In the ICDM paper, the authors proposed three methods to measure the similarity between two users without concerning about how to find them; In the ICWSM paper, the major task was how to find those similar users in a large graph given the definition of similarity.
 
  
In sum, the two papers focused on totally different perspective of the problem and the methods are not comparable.
+
The Ramage paper used LDA topic model to model each tweet, while the Naaman paper used various quantitative (statistical) methods, such as Pearson Chi-square, Kalensky’s analysis, Ward’s linkage cluster analysis.
  
 
== Dataset ==  
 
== Dataset ==  
Again, because of the different perspective of the two papers, they used different dataset to evaluation their methods. In the ICDM paper, the authors used MSN Space blog data and randomly selected 10k users so that the major concern is how to measure the similarity between two users without worrying about the scalability too much.
 
  
The ICWSM paper used BlogCatalog and DBLP dataset. For both of datasets, the ICWSM paper used user metadata to define similarity and evaluated the proposed methods on a large scale.
+
The Ramage paper used the Twitter data they crawled within one week.  
 +
The Naaman paper used the Twitter data they crawled in the similar manner within three weeks.
  
 
== Big Idea ==  
 
== Big Idea ==  
The two papers were solving similar problem from different perspectives, However, those two perspective are not likely to be combined easily as both of the papers make some simplifying assumption about other perspectives so that they can just focus on one.
+
Both papers try to categorize tweets into certain categories that represent social meanings; they also try to see what can be found from the users whose posting practices are characterized by the type of tweets they publish.
 
 
  
 
== Questions ==
 
== Questions ==
 
1. How much time did you spend reading the (new, non-wikified) paper you summarized?
 
1. How much time did you spend reading the (new, non-wikified) paper you summarized?
  
35 min
+
1 hour
  
  
Line 43: Line 56:
 
4. How much time did you spend reading background materiel?
 
4. How much time did you spend reading background materiel?
  
15 min
+
30 min
  
  

Latest revision as of 22:13, 5 November 2012

Two Papers

1 Ramage et al ICWSM 2010

2 Naaman et al 2010

Problem

In the Ramage paper, the authors claims that the topics of all the tweets in Twitter can be classified into four categories:

- Substance Topics about events and ideas
- Social Topics recognizing language used toward a social end
- Status Topics denoting personal update 
- Style Topic that contains broader trends in language usage

They used the LDA topic model to model the latent topic information in each tweet.

In the Naaman paper, the authors also categorized the tweets from their underlying social meanings. However, they came up with 9 categories instead of 4:

- Information Sharing (IS)
- Self Promotion (SP)
- Opinions/Complaints (OC)
- Statements and Random Thoughts (RT)
- Me now (ME): talking about the user's own feelings
- Question to followers (QF)
- Presence Maintenance (PM)
- Anecdote (me) (AM)
- Anecdote (other) (AO)

Although the two categorizing methods are quite different, they both look into the latent meanings of the tweets and try to find the connection between the tweet's words and its social meaning.

Algorithm

The Ramage paper used LDA topic model to model each tweet, while the Naaman paper used various quantitative (statistical) methods, such as Pearson Chi-square, Kalensky’s analysis, Ward’s linkage cluster analysis.

Dataset

The Ramage paper used the Twitter data they crawled within one week. The Naaman paper used the Twitter data they crawled in the similar manner within three weeks.

Big Idea

Both papers try to categorize tweets into certain categories that represent social meanings; they also try to see what can be found from the users whose posting practices are characterized by the type of tweets they publish.

Questions

1. How much time did you spend reading the (new, non-wikified) paper you summarized?

1 hour


2. How much time did you spend reading the old wikified paper?

20 min


3. How much time did you spend reading the summary of the old paper?

10 min


4. How much time did you spend reading background materiel?

30 min


5. Was there a study plan for the old paper? if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them?

No study plan


6. Give us any additional feedback you might have about this assignment.