Extracting Opinion Expressions with semi-Markov Conditional Random Fields

From Cohen Courses
Revision as of 11:21, 29 September 2012 by Ydalal (talk | contribs) (Created page with '== Citation == @InProceedings{yang-cardie:2012:EMNLP-CoNLL, author = {Yang, Bishan and Cardie, Claire}, title = {Extracting Opinion Expressions with semi-Markov Con…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

@InProceedings{yang-cardie:2012:EMNLP-CoNLL,

 author    = {Yang, Bishan  and  Cardie, Claire},
 title     = {Extracting Opinion Expressions with semi-Markov Conditional Random Fields},
 booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
 month     = {July},
 year      = {2012},
 address   = {Jeju Island, Korea},
 publisher = {Association for Computational Linguistics},
 pages     = {1335--1345},
 url       = {http://www.aclweb.org/anthology/D12-1122}

}


Online version

Extracting ACLWEB'2012

Summary

This paper discusses the development of SentiWordNet, a lexical resource in which each WordNet synset s is associated to three numerical scores Obj(s), Pos(s), Neg(s) used to describe how objective, positive and negative the terms contained in the synset are.

The motivation behind this research is to aid Opinion mining by providing an off the shelf lexical resource that provides a granular level of opinion tags for a large set of words.

The development method of SentiWordNet is an adaptation of PN-polarity [Esuli and Sebastiani, 2005] and SO-polarity [Esuli and Sebastiani, 2006] identification methods. The proposed method uses a set of ternary classifiers, capable of deciding whether a synset is Positive, Negative or Objective. Each ternary classifier differs from other in two perspectives, first the training data used, secondly, the learner. Thus each ternary classifier produces different classification results for a synset. The final opinion score is calculated using the normalization of scores from all the classifiers.