Difference between revisions of "McDonald et al, 2007"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'This is a [[Category::Dataset]] compiled and studied by McDonald et al in the paper [http://www.ryanmcd.com/papers/sentimentACL07.pdf Structured models for fine-to-coarse sentime…')
 
 
Line 3: Line 3:
 
It is a corpus of 600 online product reviews from three domains: car seats for children, fitness equipment, and Mp3 players. Reviews were manually filtered to remove duplicate reviews, reviews with insufficient text, and spam. All reviews were labeled by online customers as having a positive or negative polarity on the document level. Each review was then split into sentences and every sentence annotated by a single annotator as either being positive, negative or neutral. All sentences were annotated based on their context within the document. Sentences were annotated as neutral if they conveyed no sentiment or had indeterminate sentiment from their context. Many neutral sentences pertain to the circumstances under which the product was purchased. A common class of sentences were those containing product features. These sentences were annotated as having positive or negative polarity if the context supported it. This could include punctuation such as exclamation points, smiley/frowny faces, question marks, etc. The supporting evidence could also come from another sentence, e.g., “I love it. It has 64Mb of memory and comes with a set of earphones”.
 
It is a corpus of 600 online product reviews from three domains: car seats for children, fitness equipment, and Mp3 players. Reviews were manually filtered to remove duplicate reviews, reviews with insufficient text, and spam. All reviews were labeled by online customers as having a positive or negative polarity on the document level. Each review was then split into sentences and every sentence annotated by a single annotator as either being positive, negative or neutral. All sentences were annotated based on their context within the document. Sentences were annotated as neutral if they conveyed no sentiment or had indeterminate sentiment from their context. Many neutral sentences pertain to the circumstances under which the product was purchased. A common class of sentences were those containing product features. These sentences were annotated as having positive or negative polarity if the context supported it. This could include punctuation such as exclamation points, smiley/frowny faces, question marks, etc. The supporting evidence could also come from another sentence, e.g., “I love it. It has 64Mb of memory and comes with a set of earphones”.
  
[[File::McDonaldDataset.jpg]]
+
[[File:McDonaldDataset.jpg]]

Latest revision as of 08:02, 4 October 2012

This is a Dataset compiled and studied by McDonald et al in the paper Structured models for fine-to-coarse sentiment analysis.

It is a corpus of 600 online product reviews from three domains: car seats for children, fitness equipment, and Mp3 players. Reviews were manually filtered to remove duplicate reviews, reviews with insufficient text, and spam. All reviews were labeled by online customers as having a positive or negative polarity on the document level. Each review was then split into sentences and every sentence annotated by a single annotator as either being positive, negative or neutral. All sentences were annotated based on their context within the document. Sentences were annotated as neutral if they conveyed no sentiment or had indeterminate sentiment from their context. Many neutral sentences pertain to the circumstances under which the product was purchased. A common class of sentences were those containing product features. These sentences were annotated as having positive or negative polarity if the context supported it. This could include punctuation such as exclamation points, smiley/frowny faces, question marks, etc. The supporting evidence could also come from another sentence, e.g., “I love it. It has 64Mb of memory and comes with a set of earphones”.

McDonaldDataset.jpg