McDonald et al, 2007

From Cohen Courses
Jump to navigationJump to search

This is a Dataset compiled and studied by McDonald et al in the paper Structured models for fine-to-coarse sentiment analysis.

It is a corpus of 600 online product reviews from three domains: car seats for children, fitness equipment, and Mp3 players. Reviews were manually filtered to remove duplicate reviews, reviews with insufficient text, and spam. All reviews were labeled by online customers as having a positive or negative polarity on the document level. Each review was then split into sentences and every sentence annotated by a single annotator as either being positive, negative or neutral. All sentences were annotated based on their context within the document. Sentences were annotated as neutral if they conveyed no sentiment or had indeterminate sentiment from their context. Many neutral sentences pertain to the circumstances under which the product was purchased. A common class of sentences were those containing product features. These sentences were annotated as having positive or negative polarity if the context supported it. This could include punctuation such as exclamation points, smiley/frowny faces, question marks, etc. The supporting evidence could also come from another sentence, e.g., “I love it. It has 64Mb of memory and comes with a set of earphones”.

McDonaldDataset.jpg