Pang et al EMNLP 2002
Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, 79–86.
This is an early and influential paper that introduced the use of supervised learning for review classification. The authors used a corpus of 1400 movie reviews that had been rated (by the authors) as positive or negative, and compared several approaches to predicting polarity.
- A hand-coded lexicon of polar words. This gave accuracy of around 70% (the dataset is balanced so random chance would be 50%).
- Off-the-shelf classifier learners (Naive Bayes and SVM-lite) which performed well on topical text classification. This gave accuracy in the high 70's and low 80's.
Some feature-engineering techniques suggested by the results in Turney, ACL 2002 were explored, such as using phrases (bigrams) instead of unigrams, and using part of speech information, without major improvements in accuracy.