Difference between revisions of "Opinion mining"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Summary == Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews.…')
 
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Summary ==
 
== Summary ==
  
Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review.  
+
Opinion mining is a [[category::problem]] in the field of information extraction that aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review.  
  
Various named entity type hierarchies have been proposed in the literature, such as [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html BBN's categories] (used in Question Answering) and [http://nlp.cs.nyu.edu/ene/ Sekine's Extended Named Entity Hierarchy]
+
== Common Approaches ==
  
== Common Approaches ==
+
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining.
 +
 
 +
* Document level
 +
** [[relatedPaper::Turney,2002]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of all the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.
  
Some common models for named entity recognition include the following:
+
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.
* '''Lexicons'''
 
** Checks if a token is part of a predefined set
 
* '''Classifying pre-segmented candidates'''
 
** Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is
 
* '''Sliding Window'''
 
** Try all reasonable token windows (different lengths and positions), train a [[UsesMethod::Naive Bayes]] classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold
 
* '''Token Tagging / Sequential'''
 
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]], [[UsesMethod::Maximum Entropy Markov Models]], or [[Uses:Method::Conditional Random Fields]].
 
  
== Example Systems ==
+
** [[Dave et al.,2003]] introduced a novel approach to classify reviews in Amazon.com using normalized term frequency in uni-gram, bi-gram and tri-gram.  
* [http://nlp.stanford.edu/ner/index.shtml Stanford NER]
 
* [http://cogcomp.cs.illinois.edu/page/software_view/4 Illinois Named Entity Tagger]
 
  
== References / Links ==
 
* BBN Named Entity Types - [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html]
 
* Satoshi Sekine's Extended Named Entity Hierarchy - [http://nlp.cs.nyu.edu/ene/]
 
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]
 
  
== Relevant Papers ==
+
* Feature level
 +
** [[Zhuang et al., 2006]] introduced a novel technique to classify movie reviews by extracting high frequency feature keywords.
  
{{#ask: [[AddressesProblem::Named Entity Recognition]]
+
** [[Liu, 2004]] uses a statistical rule-based approach to extract high frequency feature words.
| ?UsesMethod
 
| ?UsesDataset
 
}}
 

Latest revision as of 12:10, 2 December 2010

Summary

Opinion mining is a problem in the field of information extraction that aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review.

Common Approaches

Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining.

  • Document level
    • Turney,2002 presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of all the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.
    • Dave et al.,2003 introduced a novel approach to classify reviews in Amazon.com using normalized term frequency in uni-gram, bi-gram and tri-gram.


  • Feature level
    • Zhuang et al., 2006 introduced a novel technique to classify movie reviews by extracting high frequency feature keywords.
    • Liu, 2004 uses a statistical rule-based approach to extract high frequency feature words.