Determining term subjectivity and term orientation for opinion mining.

From Cohen Courses
Jump to navigationJump to search

Citation

 title = {Determining Term Subjectivity and Term Orientation for Opinion Mining},
 author = {Andrea Esuli and Fabrizio Sebastiani},
 booktitle = {In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL’06},
 year = {2006}

Abstract from the paper

Opinion mining is a recent subdiscipline of computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. To aid the extraction of opinions from text, recent work has tackled the issue of determining the orientation of “subjective” terms contained in text, i.e. deciding whether a term that carries opinionated content has a positive or a negative connotation. This is believed to be of key importance for identifying the orientation of documents, i.e. determining whether a document expresses a positive or negative opinion about its subject matter.

We contend that the plain determination of the orientation of terms is not a realistic problem, since it starts from the non-realistic assumption that we already know whether a term is subjective or not; this would imply that a linguistic resource that marks terms as “subjective” or “objective” is available, which is usually not the case. In this paper we confront the task of deciding whether a given term has a positive connotation, or a negative connotation, or has no subjective connotation at all; this problem thus subsumes the problem of determining subjectivity and the problem of determining orientation. We tackle this problem by testing three different variants of a semi-supervised method previously proposed for orientation detection. Our results show that determining subjectivity and orientation is a much harder problem than determining orientation alone.

Online version

pdf link to the paper


Summary of approach

  • This article presents a method for determining both term subjectivity and term orientation for opinion mining applications. The proposed semi-supervised learning algorithm labels each term as having a Positive connotation (e.g. honest, intrepid) or a Negative connotation (e.g. disturbing, superfluous), or having instead no Subjective connotation at all (e.g. white, triangular).
  • The semi-supervised learning algorithm uses three small seed sets of the human-labelled data (Positive and Negative terms, and Objective terms which are the complement of the union of Positive and Negative) to iteratively extend the labelled sets by navigating the WordNet graph along the synonymy and antonymy relations.
  • The WordNet glosses of terms are used to build a vectorial representation for each term, where words are weighted by cosine-normalized tf * idf. This representation method is based on the assumption that terms with similar orientation tend to have similar glosses.
  • Given vectorial representations of terms and labeled sets, the terms are fed into a supervised learner.


Experiments and results

  • The benchmark used for experiments is the General Inquirer lexicon. From this labelled set about 2K Positive and 2.2K Negative terms are extracted, and 5K terms that are not labelled as either Positive or Negative are being selected as Objective.
  • For classification of terms into three categories authors experiment with three learning approaches: Approach 1 and Approach 2 are two stage methods which consist of learning two binary classifiers. In Approach 1 they first classify terms into Subjective and Objective, and then they classify the Subjective category into Positive and Negative. In Approach 2 They first learn Positive and not Positive, and then the not Positive examples are split into Negative and not Negative (Objective). Approach 3 is a ternary classifier.
  • The best accuracy obtained across 120 different experiments is 67% by using Approach 2.


Discussion

With respect to pure term-orientation task (only Positive and Negative) the accuracy drops significantly: in their previous work (Esuli and Sebastiani, 2005) on the same benchmark and the same algorithms discriminating between Positive and Negative terms the authors reported on 83% accuracy. This result suggest that deciding term subjectivity is a substantially harder task than deciding term orientation alone.


Related Papers

  • Andrea Esuli and Fabrizio Sebastiani. Determining the semantic orientation of terms through gloss classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM 2005), Bremen, DE, pp. 617-624. pdf