Davidov et al COLING 10

From Cohen Courses
Revision as of 23:15, 1 October 2012 by Anikag (talk | contribs) (Created page with 'This a [[Category::Paper]] reviewed for Social Media Analysis 10-802 in Fall 2012. == Citation == author = {Dmitry Davidov and Oren Tsur and …')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.

Citation

author    = {Dmitry Davidov and
              Oren Tsur and
              Ari Rappoport},
 title     = {Enhanced Sentiment Learning Using Twitter Hashtags and Smileys},
 booktitle = {COLING (Posters)},
 year      = {2010},
 pages     = {241-249},
 ee        = {http://aclweb.org/anthology-new/C/C10/C10-2028.pdf},
 crossref  = {DBLP:conf/coling/2010p},
 bibsource = {DBLP, http://dblp.uni-trier.de}

Online Version

Enhanced sentiment learning using Twitter hashtags and smileys

Summary

The paper proposes a supervised framework for sentiment classification utilizing the Twitter dataset. The paper classifies the sentiment beyond the positive and negative labels by utilizing the 50 Twitter tags and 15 smileys as sentiment labels. The short textual sentences ( tweet) are sometimes labeled as sentiment tags, which assigns sentiment values to the tweet. The paper utilizes such tagged Twitter data for classification of a wide variety of sentiment types from text.our different kinds of features used and show that our framework successfully identifies sentiment types of the untagged tweets.relation between different emotions. F Automated identification of diverse sentiment types

Methodoloy

The features employed in classifying the sentiments can be broadly divided into 4 distinct types.

- Single-word features
 They are considered as binary features with weight equal to the inverted count in the corpus
- n-gram features
  The 2-5 length of consecutive words are considered as binary features with the same weighting as for the single-word features
- Pattern-based features
  The words are classified as High frequency words (HFW) and content words (CW). A pattern is defined as an ordered sequence of HFW and slots for CW based on the frequency threshold. A pattern is defined as containing 2-6 HFWs and 1-5 slots for CWs. The weight for the pattern 
- punctuation features


Conclusion

This framework avoids the need for labor intensive manual annotation, allowing identification and classification of diverse types of short texts.

Related Work

McDonal et al (2007) Davidov and Rappoport (2006) Davidov and Rappoport (2008)