Esuli and Sebastiani LREC 2006
This a Paper for Social Media Analysis 10-802 in Fall 2012.
Contents
Citation
Esuli, Andrea and Sebastiani, Fabrizio, SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining, 2006, In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC 06)
Online version
SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining
Summary
This paper uses the WordNet dataset as a resource for addressing opinion mining. The authors develop a three-part layer over the existing WordNet ontology, adding scores for Objectivity, Positivity and Negativity for each WordNet synset (collection of terms with the same meaning).
The approach used in the paper is semi-supervised learning, where a small hand-labeled set of terms is used to seed an automatic process which generates more labeled data. They use WordNet lexical relationships to expand both positive and negative sets of terms, with the remaining terms labeled as objective when they coincide with terms excluded from the General Inquirer lexicon.
Given the training datasets, the glosses (dictionary definitions) for each synset are represented in vectorized form using tf * idf, cosine normalized weighting and are then fed into standard supervised learning algorithms (Rocchio and SVMs) to generate several semi-independent classifiers. The binary outputs of these classifiers are then combined to produce a score between 0.0 and 1.0 inclusive for objectivity, positivity and neutrality.
The entire WordNet dataset is then assigned scores using the trained classifiers.
Results
The authors found that nearly 1/4 of all WordNet words were labeled as non-objective by their classifiers. However, as the degree of non-objectivity increases, the number of strongly polar words sharply decreases. Thus, only a relatively small proportion of WordNet terms convey strong sentiment.
As for accuracy, the authors acknowledge that they currently lack the ability to verify the results output by their committee classifier, since the lack of labeled data was what prompted their approach in the first place.
- As a proxy for determining accuracy, the authors previously compared their labelings to those of the General Inquirer lexicon, as reported in Esuli and Sebastiani EACL 2006.
- They claim to have in preparation a large-scale manual labeling project, with five independent evaluators labeling 1000 WordNet terms, which would allow them to compare their results against a human generated ground truth, at a later time.
Visual Output
The authors present a web-based tool [1] that visualizes the relationship between objectivity, positivity and negativity scores for each term. The sum of the three scores is 1, so the results can be represented within a simplex, with the corners representing full objectivity, full positivity or full negativity.
Discussion
The paper presents a potentially useful resource in the SentiWordNet, which can have application for sentiment analysis tasks. The authors develop a web-based tool for visualizing the three-part scoring relationship for each term. These tools may be useful, but the true value is a function of its accuracy, which is currently unknown. Even if comprehensive, the tool might not be useful to researchers if the sentiment scores output by the classifiers do not reflect the true sentiment of terms. Thus, this research represents an interesting direction in which more work needs to be done.
Related papers
- Esuli and Sebastiani EACL 2006 is the author's prior work that led to the topic of the current paper.
- Andreevskaia and Bergler EACL 2006 discusses sentiment tag extraction from WordNet glosses.
- Hatzivassiloglou and McKeown ACL 1997 describes early work in using sentiment word lists to classify text sentiment.
- Yu and Hatzivassiloglou EMNLP 2003 investigates the problem of identifying objective and sentimental sentences and determining their polarity.
Study plan
Some concepts which made aid in understanding this paper
- Term Frequency, Inverse Document Frequency (tf * idf)
- A tutorial slideshow discussing the particular method the authors used to vectorize and weight synset glosses.
- WordNet
- A description of the WordNet lexical resource.
- WordNet Home