The viability of web-derived Polarity Lexicons
This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.
Contents
Citation
author = {Leonid Velikovich and Sasha Blair-Goldensohn and Kerry Hannan and Ryan T. McDonald}, title = {The viability of web-derived polarity lexicons}, booktitle = {HLT-NAACL}, year = {2010}, pages = {777-785}, ee = {http://www.aclweb.org/anthology/N10-1119}, bibsource = {DBLP, http://dblp.uni-trier.de}
Online version
The viability of web-derived polarity lexicons
Summary
The authors examine the viability of building large polarity lexicons semi-automatically from the web. They describe a graph propagation approach to build an English lexicon without making use of language dependent resources like Wordnet, POS taggers, etc, as with previous approaches to sentiment analysis. As such the lexicons proposed are not limited to specific word classes and also contain slang, misspellings, multiword expressions, etc. They report a qualitative and quantitative evaluation of the derived lexicons and show superior performances to previously studied lexicons on the sentence polarity classification task.
Approach
Polarity lexicons are large lists of phrases that encode the polarity of each phrase either positive or negative often with some score to represent magnitude of polarity. The authors propose a graph propagation approach inspired by previous work on constructing polarity lexicons from lexical graphs but without using linguistic resources like Wordnet. Instead the graph is built using co-ocurrance statistics from the entire web.
The algorithm is different from common graph propagation algorithms like label propagation. It produces a output polarity vector with Polarity score of candidate phrase. Algorithm computes both positive and negative polarity score for each node in the graph. These are the equal to the sum over the max weighted path from every seed word[positive or negative] to node . Final polarity of a phrase is where is a constant to account for overall mass of positive and negative flow in the graph. The algorithm is iterative and considers paths of increasing length at each iteration. Input variable T controls the max path length considered. Parameter defines the minimum polarity magnitude a phrase must have to be included in the lexicon.
- Building the Phrase Graph from web:
For this study the authors used an English graph where th node set V was based on all n-grams up to the length 10 extracted from 4 Billion web pages, filtered to 20 million candidates via heuristics. Context vector for each phrase based on the window of size six aggregated over all mentions of the phrase in the set. Edges E are constructed by computing cosine similarity between context vectors and then picking the top 25 most weighted edges adjacent to either of the nodes involved in the edge, to reduce size and remove spurious edges due to frequently occurring phrases. Due to large context windows this graph can have edges between positive and negative sentiment words. The authors propose that the algorithm handles this by computing polarity as the aggregate of all the best paths to seed words. Choosing the best path to seed word rather than all the paths as in Label Propagation, is the main difference between the two approaches.
Task Description and Evaluation
- Lexicon Statistics:
Generated using 187 Positive Seeds and 192 Negative Seeds manually annotated causing a lexicon of size 178104 to be generated.
- Comparison with other Lexicon Sets:
- Wilson et al - WordNet LP - Web GP : Web derived lexicon from this paper.
- Qualitative Evaluation:
- Distribution of phrases in terms of number of tokens - Most frequent phrase length is 2. Longer phrases less frequent. - Multiword Phrases identified. Spelling variations for positive phrases more prominent than for negative phrases. - Vulgar, derogatory and racial slurs abundant in phrases that achieved negative sentiment.
- Quantitative Evaluation:
- Performance measured on Sentence Sentiment Classification/Ranking task. - Dataset of 554 consumer reviews described in McDonald et al, 2007. 3916 sentences with 1525 positive, 1542 negative and 849 neutral sentences. - Evaluation: - Lexicon classifier - Classification done using augmented Vote-Flip algorithm - Ranking done using Purity[-1,1] of sentence X. - Contextual Classifier: Maximum_Entropy_model trained and evaluated on 10-fold cross validation on evaluation data. - Meta Classifier: Contextual classifier using features derived from all lexicons. - Results:
Findings
- Web Derived lexicons seem to capture phrases not captured by earlier systems like spelling variations, slang, vulgarity and multi word expressions.
- Web derived lexicon show superior performance to previously published English lexicons.
- Independence from language dependent resources like WordNet and making use of unlabeled data.
Related papers
- Description of WordNet LP
S. Blair-Goldensohn, K. Hannan, R. McDonald, T. Neylon,G.A. Reis, and J. Reynar. 2008. Building a sentiment summarizer for local service reviews. In NLP in the Information Explosion Era.
- Wilson et al. Lexicon set.
T. Wilson, J. Wiebe, and P. Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
- Dataset for evaluation McDonald et al.
R. McDonald, K. Hannan, T. Neylon, M. Wells, and J. Reynar. 2007. Structured models for fine-to-coarse sentiment analysis. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL).
- Related work in Japanese.
N. Kaji and M. Kitsuregawa. 2007. Building lexicon for sentiment analysis from massive collection of HTML documents. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).
- Label Propogation
X. Zhu and Z. Ghahramani. 2002. Learning from labeled and unlabeled data with label propagation. Technical report, CMU CALD tech report CMU-CALD-02.
Study plan
- Label Propagation Algorithm
- Vote Flip Algorithm
- Maximum Entropy Model