Difference between revisions of "Huang 2010 Conversational Tagging in Twitter"

From Cohen Courses
Jump to navigationJump to search
 
(12 intermediate revisions by the same user not shown)
Line 13: Line 13:
 
== Key Contributions ==
 
== Key Contributions ==
  
The paper consolidates several previous research on automatically collecting twitter corpus and presents a novel approach for this problem. The author also shows their collected corpus details with basic statistics. They also claim that their classifier for sentiment analysis on twitter messages outperforms previously used methods.
+
The paper made a key contributions by its findings on the differences between twitter tags and tags in previous systems. It presents the old-style tags as a posteriori and the twitter-style tags as a priori. As claimed by the authors, it is the first large-scale study on twitter tags.
  
== Corpus Collection ==
+
== Dataset ==
The author mentioned that they would like to collect twitter messages for three different classes: positive, negative and subjective. They used Twitter API to collect all the message with the following criteria for their classes.  
+
They author created their own dataset, from 2 different sources: Twitter and Delicious. They collected a sample of 42 million hashtags used in the microblogging website Twitter, inserted in messages posted by users. They also got a sample of 378 million tags from the online bookmarking service Delicious, created by users to organize their bookmarks. Both of these datasets contain the tag along with the timestamp of when that tag was attached, intended for temporal analysis.
  
* If the message contains happy emoticons (in this case, ":-)", ":)", ":D" etc.), it is considered as positive message
+
== Qualitative Analysis ==
 +
The authors first present their qualitative analysis of the tags used
 +
in Twitter and Delicious. The authors went through the 224 most
 +
common tags in the Twitter dataset and the 304 most common
 +
tags in the Delicious dataset. And they show three key insights:
  
* If the message contains sad emoticons (in this case, ":-(", ":(", ";(" etc.), it is considered negative message
+
* ''Trending Effect''
 +
Twitter started displaying trending topic information on their front page.
 +
These trending topic lists are individually linked to the current set
 +
of tweets composed on that topic. While tweets without hashtags
 +
were also displayed in trending topic lists, the act of tagging a
 +
tweet increased the likelihood of a tweet being displayed in a
 +
group of tweets on a trending topic.
  
* They queried accounts of 44 newspapers to collect their tweets and considered them objective messages
+
* ''Conversational vs. Organizational''
 +
Tagging practices in Twitter are an example of a new type of
 +
tagging, which the authors call them ‘conversational’ tagging. In
 +
conversational tagging, the tag itself is an important piece of the
 +
message. The tag can either serve as a label in the traditional sense
 +
of a tag, or it can serve as a prompt for user comment. In many
 +
trending topics, Twitter tags sometimes serve as prompts, and the
 +
resulting content is an asynchronous massively-multi-person
 +
conversation. While these are not the only types of tags used in
 +
Twitter, the authors argue that this is a type of tagging behavior that
 +
emerged due to the structure of the Twitter system.
  
== Corpus Analysis ==
+
* ''Micro-memes''
The authors conducted the word frequencies distribution analysis and showed that the results followed Zipf's law. And the author also performed POS tagging on all the twitter messages and presented the variation of the POS tags across different classes.
+
An interesting case called "micro-meme" was presented by the authors. They picked #igrewupon, #liesmentell, #igottacrushon and #90stweet as example that were observed in Twitter associated with
 +
emergent micro-memes. These hashtags are rarely used to retrieve
 +
old tweets; instead, they provide synchronic metadata used to
 +
funnel related tweets into common streams.
  
* ''Positive vs. Negative Tags''
+
== Quantitative (Statistical) Analysis ==
The author shows an indicator of a positive text is superlative adverbs (RBS), such as “most” and “best”. Positive texts are also characterized by the use of possessive ending (POS). And the negative set contains more often verbs in the past tense (VBN, VBD), because many authors express their negative sentiments about their loss or disappointment.
+
The authors conducted the statistical analysis, mostly on the temporal effects of those twitter hashtags and presented the results on standard deviation, skew and kurtosis of the hashtagged messages' timestamps.
  
* ''Subjective vs. Objective Tags''
+
From the standard deviation, they find that a low standard deviation is a good
The author observe that objective texts tend to contain more common and proper nouns (NPS, NP, NNS), while authors of subjective texts use more often personal pronouns (PP, PP$).
+
indicator that the tag is used for conversational (i.e. social) rather
 
+
than organizational purposes. On the other hand, the tag with high standard deviation tends to represent a group of topically related tweets. They show two figures using the skew to illustrate the gradual adoption of the tag #twitterafterdark and a slow abandonment of #postcrossing over the year of 2009. From the fourth moment, kurtosis, they use two figure to illustrate how to measure tags' staying power.
== Sentiment Classification and Results ==
 
The authors present details on their sentiment classification experiments including the feature extractions, classifier building and experiments results.
 
 
 
In feature extraction, authors present a four-step approach consisting of (1) filtering URL links, Twitter user names and such non-informative tokens; (2) tokenizing the text with punctuation marks and spaces; (3) removing stopwords (articles); (4) constructing n-grams.
 
 
 
In classifier building, the authors claim that they have tried [[UsesMethod::Naive Bayes classifier learning|Naive Bayes]], [[UsesMethod::Support vector machine classifier learning|SVM]] and [[UsesMethod::Conditional Random Fields|CRF]]. However, Naive Bayes classifier works the best thus was picked.
 
 
 
In the final results, the authors present several comparisons between systems with different settings and conclude that the Naive Bayes classifier with bigram features works best due to its good balance between coverage and sentiment patterns.
 
  
 
== Discussion ==
 
== Discussion ==
  
This paper is highly related to our proposed course project on automatic Twitter message clustering based on hashtags.
+
This paper gives a broad overview of twitter hashtags, in particular from user's perspective. It is thus highly related to our proposed course project on automatic Twitter message clustering based on hashtags.

Latest revision as of 22:19, 31 March 2011

Citation

Jeff Huang, Katherine M. Thornton and Efthimis N. Efthimiadis. 2010. Conversational Tagging in Twitter. In Proceedings of ACM HT.

Online version

An online version of this paper is available at [1].

Summary

This paper presents a study of Twitter tags versus tags in other Web 2.0 systems. They show several findings on their differences and similarities. They claim that twitter tags are more about filtering and directing content so that it appears in certain streams.

Key Contributions

The paper made a key contributions by its findings on the differences between twitter tags and tags in previous systems. It presents the old-style tags as a posteriori and the twitter-style tags as a priori. As claimed by the authors, it is the first large-scale study on twitter tags.

Dataset

They author created their own dataset, from 2 different sources: Twitter and Delicious. They collected a sample of 42 million hashtags used in the microblogging website Twitter, inserted in messages posted by users. They also got a sample of 378 million tags from the online bookmarking service Delicious, created by users to organize their bookmarks. Both of these datasets contain the tag along with the timestamp of when that tag was attached, intended for temporal analysis.

Qualitative Analysis

The authors first present their qualitative analysis of the tags used in Twitter and Delicious. The authors went through the 224 most common tags in the Twitter dataset and the 304 most common tags in the Delicious dataset. And they show three key insights:

  • Trending Effect

Twitter started displaying trending topic information on their front page. These trending topic lists are individually linked to the current set of tweets composed on that topic. While tweets without hashtags were also displayed in trending topic lists, the act of tagging a tweet increased the likelihood of a tweet being displayed in a group of tweets on a trending topic.

  • Conversational vs. Organizational

Tagging practices in Twitter are an example of a new type of tagging, which the authors call them ‘conversational’ tagging. In conversational tagging, the tag itself is an important piece of the message. The tag can either serve as a label in the traditional sense of a tag, or it can serve as a prompt for user comment. In many trending topics, Twitter tags sometimes serve as prompts, and the resulting content is an asynchronous massively-multi-person conversation. While these are not the only types of tags used in Twitter, the authors argue that this is a type of tagging behavior that emerged due to the structure of the Twitter system.

  • Micro-memes

An interesting case called "micro-meme" was presented by the authors. They picked #igrewupon, #liesmentell, #igottacrushon and #90stweet as example that were observed in Twitter associated with emergent micro-memes. These hashtags are rarely used to retrieve old tweets; instead, they provide synchronic metadata used to funnel related tweets into common streams.

Quantitative (Statistical) Analysis

The authors conducted the statistical analysis, mostly on the temporal effects of those twitter hashtags and presented the results on standard deviation, skew and kurtosis of the hashtagged messages' timestamps.

From the standard deviation, they find that a low standard deviation is a good indicator that the tag is used for conversational (i.e. social) rather than organizational purposes. On the other hand, the tag with high standard deviation tends to represent a group of topically related tweets. They show two figures using the skew to illustrate the gradual adoption of the tag #twitterafterdark and a slow abandonment of #postcrossing over the year of 2009. From the fourth moment, kurtosis, they use two figure to illustrate how to measure tags' staying power.

Discussion

This paper gives a broad overview of twitter hashtags, in particular from user's perspective. It is thus highly related to our proposed course project on automatic Twitter message clustering based on hashtags.