SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining

Citation

Andrea Esuli , Fabrizio Sebastiani, "SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining". In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC’06),417-422.

Online version

LREC 2006, SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining

Summary

This paper discusses the development of SentiWordNet, a lexical resource in which each WordNet synset s is associated to three numerical scores Obj(s), Pos(s), Neg(s) used to describe how objective, positive and negative the terms contained in the synset are.

The motivation behind this research is to aid Opinion mining by providing an off the shelf lexical resource that provides a granular level of opinion tags for a large set of words.

The development method of SentiWordNet is an adaptation of PN-polarity [Esuli and Sebastiani, 2005] and SO-polarity [Esuli and Sebastiani, 2006] identification methods. The proposed method uses a set of ternary classifiers, capable of deciding whether a synset is Positive, Negative or Objective. Each ternary classifier differs from other in two perspectives, first the training data used, secondly, the learner. Thus each ternary classifier produces different classification results for a synset. The final opinion score is calculated using the normalization of scores from all the classifiers.

Background

In Opinion mining there are mainly three tasks related to tagging the given text with expressed opinion:

Determining text SO-polarity by checking whether the text has factual nature or expresses opinion.Pang and Lee,2004 Hatzivassiloglou,2003
Determining text PN-polarity by checking whether text expresses a positive or negative opinion on subject matter.Pang and Lee,2004 Turney,_ACL_2002
Determining the strength of text PN-polarity by checking the expressed opinion's emphasis (Weak, Mild, Strong).Pang and Lee,2005

Wilson et al.,2004

Method

Training Data

A small subset L $\subset$ T_r of the training data Tr is manually labeled. The labeled data L is union of three type of labeled data. Lo objective synsets data, Lp positive synsets data, Ln negative synsets data. Lp and Ln are iteratively expanded in K interations into final training sets TrKp, TrKn. The expansion strategy is to use the relationship between synsets to navigate synsets and add the new synsets to the corresponding training set depending on the relationship's property of preserving/invert of involved labels. The author have used direct antonymy, similarity, derived from, pertains-to,attribute and also-see relationships for expanding the seed dataset,by inspiring from [Valitutti et al. 2004]. The Lo set is collected by two approaches. First, collect all the synsets that don't belong to either to TrKp or TrKn . Secondly, collect synsets that contain terms not marked as either positive or negative charachteristics in general Inquirer Lexicon [Stone et al., 1966].

Data Representation

Each synset is represented by a vector generated by applying consine normarlized tf*idf formula preceded by stop word removal to its gloss. The assumption behind this approach is that terms with similar polarity tend to have similar glosses. Therefore gloss words are good feature for classifiers.

Ternary Classification Model

The vectorial representations of the training synsets for a given label ci are then input to a standard supervised learner which generates two classifiers. The first classifier learns to distinguish between positive and non-positive terms. The second classifier learns to distinguish between negative and non-negative terms. A new term is positive if the first classifier classifies it as positive and second classifier classifies it as non-negative. The terms that are classified (i) into both positive and negative (ii) into both non-positive and non-negative are takent to be objective. In the training phase the terms TrKn U TrKo are used as training examples of category not-positive and the terms in TrKp U TrKo are used as training examples for category not-negative. The resulting ternary classifier is then applied to the vectorial representations of all Wordnet synsets including( TrK-L) to produce the sentiment classification of the entire WORDNET.

Visualization

The given synset's scores can be visualized in a self descriptive triangle.

SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining

Contents

Citation

Online version

Summary

Background

Method

Training Data

Data Representation

Ternary Classification Model

Visualization

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools