Difference between revisions of "M. Hurst and K. Nigam. Retrieving topical sentiments from online document collection."

From Cohen Courses
Jump to navigationJump to search
Line 16: Line 16:
 
== Summary ==
 
== Summary ==
  
This is one of the earlier works at combining Topicality and [[AddressesProblem::Polarity Classification|Polarity]] i.e identifying polar sentences about a topic. Here authors argue for the fusion of Topicality and Polarity by using statistical machine learning approaches to identify topics and shallow NLP techniques to determine polarity. They argue that polar sentences that contain the topic, denote polarity about the topic.
+
This is one of the earlier works at combining Topicality and [[AddressesProblem::Polarity Classification|Polarity]] i.e identifying polar sentences about a topic. Here authors argue for the fusion of Topicality and Polarity by using statistical machine learning approaches to identify topics and shallow NLP techniques to determine polarity. They argue for the locality assumption whereby  polar sentences that contain the topic, denote polarity about the topic.
  
 
== Dataset Description ==
 
== Dataset Description ==

Revision as of 23:08, 4 November 2012

This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.

Citation

title={Retrieving topical sentiments from online document collections},
author={Hurst, M.F. and Nigam, K.},
booktitle={Proceedings of SPIE},
volume={5296},
pages={27--34},
year={2004}

Online version

Retrieving topical sentiments from online document collection

Summary

This is one of the earlier works at combining Topicality and Polarity i.e identifying polar sentences about a topic. Here authors argue for the fusion of Topicality and Polarity by using statistical machine learning approaches to identify topics and shallow NLP techniques to determine polarity. They argue for the locality assumption whereby polar sentences that contain the topic, denote polarity about the topic.

Dataset Description

16, 616 sentences from 982 messages from online resources(usenet, online message boards, etc.) about a certain topic. Manually annotated 250 Randomly selected sentences with following labels

  • Polarity Identification: positive, negative
  • Topic Identification: Topical, Out-of-Topic
  • Polarity and Topic Identification: positive-correlated, negative-correlated, positive-uncorrelated, negative-uncorrelated, topical, out-of-topic. The positive-correlated label indicates that the sentences contained a positive polar segment that referred to the topic, positive-uncorrelated indicates that there was some positive polarity but that it was not associated with the topic in question.

Task Description and Evaluation

Polarity Identification:

The authors use a rule based approach to perform polarity identification. It has the following steps

  • Tokenization followed by POS Tagging using a statistical tagger trained on PennTreebank Data.
  • Semantic Polarity tagging using manually created predefined Topical Lexicon tuned for the domain.
  • Chunking using simple POS Tag patterns
  • Rule based Syntactic patterns and negations rules to modify and associate polarity to topics.
  • Syntactic patterns are: Predicative modification (it is good), Attributive modification (a good car), Equality (it is a good car), Polar clause (it broke my car). Negation Rules: Verbal attachment (it is not good, it isn't good)

Performance: There system achieved a precision of 82% at detecting positive polarity and precision of 80% for detecting negative polarity.

Topic Identification

Here the users try to identify the topicality of a sentence using a text classification based approach. They use a variant of the Winnow Classifier which is an online learning algorithm for learning a linear decision boundary. Since they dont have enough sentence level annotations for topicality, they use message level labels (topical or not topical) to train the classifier using standard Bag Of Words representation.


During Testing phase

  • Classify each message as Topical or Non Topical using the trained classifier
  • If message is topical, Classify each sentence in the message using the trained classifier.
  • If sentence is topical, perform semantic analysis to determine polarity.

Performance: They achieved a Message Level precision of 85.4% and a sentence level precision of 79%.

Combining Polarity and Topical Models

  • 982 messages [with 16616 sentences ]classified as topical
  • 1262 (of 16616) predicted to be topical - 316 [positive polarity], 81[negative polarity]
  • A precision[ Percentage of times a polar sentence containing topic contains polarity wrt topic] of 72% was observed.

Findings

Related papers

Study plan