Difference between revisions of "Davidov et al COLING 10"

From Cohen Courses
Jump to navigationJump to search
 
(9 intermediate revisions by the same user not shown)
Line 17: Line 17:
  
 
== Summary ==
 
== Summary ==
The paper proposes a supervised framework for [[AddressesProblem::Sentiment Classification|sentiment classification]] utilizing the [[UsesDataset::Twitter Dataset For Sentiment|Twitter dataset]]. The paper classifies the sentiment beyond the positive and negative labels by utilizing the 50 Twitter tags and 15 smileys as sentiment labels.
+
The paper proposes a supervised framework for [[AddressesProblem::Sentiment Classification|sentiment classification]] utilizing the [[UsesDataset::Twitter Dataset For Sentiment|Twitter dataset]]. The paper classifies the sentiment beyond the positive and negative labels by using the 50 Twitter tags and 15 smileys as sentiment labels.
The short textual sentences ( tweet) are sometimes labeled as sentiment tags, which assigns sentiment values to the tweet. The paper utilizes such tagged Twitter data for classification of a wide variety of sentiment types from text. The paper employs different kinds of features and shows that the framework successfully identifies sentiment types of the text(blogs and tweets).
+
The short textual sentences(tweet) of Twitter are sometimes labeled as sentiment tags, which assigns sentiment values to the tweet. The paper utilizes such tagged Twitter data for classification of a wide variety of sentiment types from text. The paper employs different kinds of features and shows that the framework successfully identifies sentiment types of the text(blogs and tweets).
  
 
== Methodoloy ==
 
== Methodoloy ==
* Features used
+
* '''Features''' - The features employed in classifying the sentiments can be broadly divided into 4 distinct types.
The features employed in classifying the sentiments can be broadly divided into 4 distinct types.
+
**''Single-word features'' - They are considered as binary features with weight equal to the inverted count in the corpus.
  * Single-word features
+
**''n-gram features'' - The 2-5 length of consecutive words are considered as binary features with the same weighting as for the single-word features.
    They are considered as binary features with weight equal to the inverted count in the corpus
+
**''Pattern-based features'' - The words are classified as '''High frequency words(HFW)''' and '''content words(CW)''' .
  * n-gram features
+
***A pattern is defined as an ordered sequence of HFW and slots for CW based on the frequency threshold.   
    The 2-5 length of consecutive words are considered as binary features with the same weighting as for the single-word features
+
***In this paper, a pattern contains 2-6 HFWs and 1-5 slots for CWs. It is based on the works of
  * Pattern-based features
+
***The weight for the pattern is assigned to the degree of match of the pattern - Exact match, Sparse match, Incomplete match and No match.
  The words are classified as High frequency words (HFW) and content words (CW).
+
*''Punctuation features'' - The weight assigned is the average weight of the respective features.
  A pattern is defined as an ordered sequence of HFW and slots for CW based on the frequency threshold.   
+
**Length of a sentence.
  A pattern is defined as containing 2-6 HFWs and 1-5 slots for CWs.  
+
**Number of exclamations
  The weight for the pattern is assigned to the degree of match of the pattern - Exact match, Sparse match, Incomplete match and No match.
+
**Number of question marks
  * Punctuation features
+
**Number of quotes
  (1) Length of a sentence. Considering per sentence, (2) Number of exclamations, (3) Number of question marks, (4) Number of quotes, (5) Number of capital words.
+
**Number of capital words.  
  The weight assigned is the average weight of the respective features.
 
  
*Classification
+
*'''Classification Algorithm'''
The algorithm used to assign the sentiment label to test examples is a slight modification of the [[UsesMethod::K-Nearest_Neighbor|k-NN algorithm]].
+
The algorithm used to assign the sentiment label to test examples is a slight modification of the [[UsesMethod::K-Nearest_Neighbor|k-NN algorithm]].
 
 
== Dataset ==
 
The dataset consists of 475 million public tweets from May 2009 to Jan 2010. All non-English characters are removed, and url links, hashtags and references have been replaced by URL/REF/TAG words. The content hashtags are treated as labels for the classification task. The sentiment labels are either hashtag-based or smiley-based.
 
*Hash-tag based labels - The frequent tags over the entire dataset were calculated and two human judges labeled them into five
 
different categories. 1. strong sentiment, 2. most likely sentiment, 3. context-dependent sentiment, 4. focused sentiment and 5. no sentiment.
 
 
 
The following table shows the annotation result.
 
 
 
[[File:Annotate.png]]
 
 
 
*Smiley based labels - Amazon Mechnanical Turk (AMT) is used to obtain the list of commonly and unambiguous ASCII smileys. 
 
50 hash-tag based of category strong sentiment and most likely sentiment along with 15 smiley based labels are considered as labels for the classification task.  
 
  
 
== Evaluation ==
 
== Evaluation ==
*Evaluation using cross-validation
+
===Evaluation using cross-validation===
 
The sentiment classification is evaluated using 10-fold cross-validation over the training set. The performance of the algorithm was tested under different feature settings.  
 
The sentiment classification is evaluated using 10-fold cross-validation over the training set. The performance of the algorithm was tested under different feature settings.  
** Multi-class classification
+
- Multi-class classification - There are 51 hashtag-based and 16 smiley based labels. The evaluation metric is the average f-score for 10-fold cross validation. The f-score for the random baseline is 0.02. The result is shown in the following table.  
There are 51 hashtag-based and 16 smiley based labels. The evaluation metric is the average f-score for 10-fold cross validation. The f-score for the random baseline is 0.02. The result is shown in the following table.  
 
  
 
[[File:Multi.png]]
 
[[File:Multi.png]]
  
 
The result is significantly better than the random baseline.
 
The result is significantly better than the random baseline.
** Binary classification
+
 
 +
- Binary classification
 
The labels are 1 if the sentence contains a particular label or 0 if the sentence does not bear any sentiment. For each of the 50 hashtag-based and 15 smiley-based labels, the binary classification is performed. The result is as shown in the following table.
 
The labels are 1 if the sentence contains a particular label or 0 if the sentence does not bear any sentiment. For each of the 50 hashtag-based and 15 smiley-based labels, the binary classification is performed. The result is as shown in the following table.
  
Line 66: Line 53:
  
 
The results show that binary classification is better than the multi-class classification with a high precision value.
 
The results show that binary classification is better than the multi-class classification with a high precision value.
* Evaluation with human judges
+
===Evaluation with human judges===
Amazon Mechanical Turk (AMT) services was used to evaluate the performance of the classifier on test data. Te evaluation was considered correct if one of the tags selected by a human judge for a sentence was one of the 5 tags predicted by the algorithm. The correlation score for this task was <math>\kappa = 0.41</math>.  
+
[https://www.mturk.com/mturk/welcome Amazon Mechanical Turk (AMT)] services was used to evaluate the performance of the classifier on test data. Te evaluation was considered correct if one of the tags selected by a human judge for a sentence was one of the 5 tags predicted by the algorithm. The correlation score for this task was <math>\kappa = 0.41</math>.
 +
 
 
== Observations ==
 
== Observations ==
 
* This work presents a supervised classification framework for which utilizes Twitter hashtags and smileys as proxies for different sentiment types as labels. It contributes to avoiding the need for labor intensive manual annotation, allowing identification and classification of diverse sentiment types of short texts.
 
* This work presents a supervised classification framework for which utilizes Twitter hashtags and smileys as proxies for different sentiment types as labels. It contributes to avoiding the need for labor intensive manual annotation, allowing identification and classification of diverse sentiment types of short texts.
 
* Binary classification of sentiments yields better results than multi-class classification.
 
* Binary classification of sentiments yields better results than multi-class classification.
 
* Punctuation, word and pattern features contributes more towards classification performance, as compared to a small marginal boost provided by the the n-gram features. Pattern features provides better performance as compared to the combined effect of the rest of the features.
 
* Punctuation, word and pattern features contributes more towards classification performance, as compared to a small marginal boost provided by the the n-gram features. Pattern features provides better performance as compared to the combined effect of the rest of the features.
* Explored inter-sentiment overlap and dependency by two simple techniques of tag occurrence and feature overlap.  
+
* Preliminary exploration on inter-sentiment overlap and dependency by two simple techniques of tag occurrence and feature overlap.
 +
* In addition, to the list of features used in the algorithm, features representing the short-term and long-term distance in the tweets could also be added.
 +
* The evaluation could also be performed on blog data other than the tweets to validate the usage of  the semantic labels in other text documents.
 +
 
 
== Study Plan ==
 
== Study Plan ==
[[UsesMethod::K-Nearest_Neighbor|K-Nearest Neighbor]]
+
* [http://www.cse.huji.ac.il/~arir/sat.pdf Davidov and Rappoport (2008)] for understanding the automated pattern based approach for extracting sentiments.
 
+
**[http://dl.acm.org/citation.cfm?id=1220213 Davidov and Rappoport (2006)]
[[UsesMethod::Multi-class Classification|Multi-class Classification]]
+
* [http://staff.science.uva.nl/~gilad/pubs/style2005-blogmoods.pdf Mishne (2005)]- study on mood classification
 
== Related Work ==
 
== Related Work ==
 
A similar work on extracting sentiment types on blogs was carried by [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334 McDonal et al (2007)].
 
A similar work on extracting sentiment types on blogs was carried by [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334 McDonal et al (2007)].
 
The automated pattern based approach for extracting sentiments is based on [http://dl.acm.org/citation.cfm?id=1220213 Davidov and Rappoport (2006)] and [http://www.cse.huji.ac.il/~arir/sat.pdf Davidov and Rappoport (2008)].
 

Latest revision as of 09:54, 4 October 2012

This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.

Citation

author    = {Dmitry Davidov and
              Oren Tsur and
              Ari Rappoport},
 title     = {Enhanced Sentiment Learning Using Twitter Hashtags and Smileys},
 booktitle = {COLING (Posters)},
 year      = {2010},
 pages     = {241-249},
 ee        = {http://aclweb.org/anthology-new/C/C10/C10-2028.pdf},
 crossref  = {DBLP:conf/coling/2010p},
 bibsource = {DBLP, http://dblp.uni-trier.de}

Online Version

Enhanced sentiment learning using Twitter hashtags and smileys

Summary

The paper proposes a supervised framework for sentiment classification utilizing the Twitter dataset. The paper classifies the sentiment beyond the positive and negative labels by using the 50 Twitter tags and 15 smileys as sentiment labels. The short textual sentences(tweet) of Twitter are sometimes labeled as sentiment tags, which assigns sentiment values to the tweet. The paper utilizes such tagged Twitter data for classification of a wide variety of sentiment types from text. The paper employs different kinds of features and shows that the framework successfully identifies sentiment types of the text(blogs and tweets).

Methodoloy

  • Features - The features employed in classifying the sentiments can be broadly divided into 4 distinct types.
    • Single-word features - They are considered as binary features with weight equal to the inverted count in the corpus.
    • n-gram features - The 2-5 length of consecutive words are considered as binary features with the same weighting as for the single-word features.
    • Pattern-based features - The words are classified as High frequency words(HFW) and content words(CW) .
      • A pattern is defined as an ordered sequence of HFW and slots for CW based on the frequency threshold.
      • In this paper, a pattern contains 2-6 HFWs and 1-5 slots for CWs. It is based on the works of
      • The weight for the pattern is assigned to the degree of match of the pattern - Exact match, Sparse match, Incomplete match and No match.
  • Punctuation features - The weight assigned is the average weight of the respective features.
    • Length of a sentence.
    • Number of exclamations
    • Number of question marks
    • Number of quotes
    • Number of capital words.
  • Classification Algorithm

The algorithm used to assign the sentiment label to test examples is a slight modification of the k-NN algorithm.

Evaluation

Evaluation using cross-validation

The sentiment classification is evaluated using 10-fold cross-validation over the training set. The performance of the algorithm was tested under different feature settings. - Multi-class classification - There are 51 hashtag-based and 16 smiley based labels. The evaluation metric is the average f-score for 10-fold cross validation. The f-score for the random baseline is 0.02. The result is shown in the following table.

Multi.png

The result is significantly better than the random baseline.

- Binary classification The labels are 1 if the sentence contains a particular label or 0 if the sentence does not bear any sentiment. For each of the 50 hashtag-based and 15 smiley-based labels, the binary classification is performed. The result is as shown in the following table.

Bin.png

The results show that binary classification is better than the multi-class classification with a high precision value.

Evaluation with human judges

Amazon Mechanical Turk (AMT) services was used to evaluate the performance of the classifier on test data. Te evaluation was considered correct if one of the tags selected by a human judge for a sentence was one of the 5 tags predicted by the algorithm. The correlation score for this task was .

Observations

  • This work presents a supervised classification framework for which utilizes Twitter hashtags and smileys as proxies for different sentiment types as labels. It contributes to avoiding the need for labor intensive manual annotation, allowing identification and classification of diverse sentiment types of short texts.
  • Binary classification of sentiments yields better results than multi-class classification.
  • Punctuation, word and pattern features contributes more towards classification performance, as compared to a small marginal boost provided by the the n-gram features. Pattern features provides better performance as compared to the combined effect of the rest of the features.
  • Preliminary exploration on inter-sentiment overlap and dependency by two simple techniques of tag occurrence and feature overlap.
  • In addition, to the list of features used in the algorithm, features representing the short-term and long-term distance in the tweets could also be added.
  • The evaluation could also be performed on blog data other than the tweets to validate the usage of the semantic labels in other text documents.

Study Plan

Related Work

A similar work on extracting sentiment types on blogs was carried by McDonal et al (2007).