Difference between revisions of "Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews"

From Cohen Courses
Jump to navigationJump to search
m
Line 15: Line 15:
 
This [[category::paper]] propose to hierarchically organize consumer reviews according to an aspect hierarchy, so as to transfer the reviews into a useful knowledge structure. This paper develops a domain-assisted approach to generate an aspect hierarchy by integrating domain knowledge and consumer reviews. They use external hierarchies such as Wordnet and Open Directory Project to learn semantic distance between aspects. They use aspect hierarchy to identify implicit aspects.
 
This [[category::paper]] propose to hierarchically organize consumer reviews according to an aspect hierarchy, so as to transfer the reviews into a useful knowledge structure. This paper develops a domain-assisted approach to generate an aspect hierarchy by integrating domain knowledge and consumer reviews. They use external hierarchies such as Wordnet and Open Directory Project to learn semantic distance between aspects. They use aspect hierarchy to identify implicit aspects.
  
 +
== Dataset ==
 +
[[File:corpus.png]]
 
== Background ==
 
== Background ==
 +
An aspect hierarchy is defined as a tree that consists of a set of unique aspects A and a set of parent-child relations R between these aspects.
  
 
== Methodology ==
 
== Methodology ==
 +
Given the consumer reviews of a product, let A = {<math>a_1</math>, · · · , <math>a_k</math>} denotes the product aspects commented in the reviews. <math>H^0</math> (<math>A^0</math>,<math>R^0</math>) denotes the initial hierarchy derived from domain knowledge. It contains a set of aspects <math>A^0</math> and relations <math>R^0</math>. Our task is to construct an aspect hierarchy H(A,R), to cover all the aspects in A and their parent-child relations R, so that the consumer reviews are hierarchically organized.
  
 +
* Initial Hierarchy Acquisition
 +
Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by [[RelatedPaper::Ye and Chua (2006)]].
 +
* Aspect Identification in Customer Reviews
 +
The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.
 +
* Semantic Distance Learning
 +
The authors develop the following semantic distance metric.
 +
** Linguistic Features
 +
** Semantic Distance Learning
 +
* Aspect Hierarchy Generation
 +
Aspects, A = {<math>a_1</math>, · · · , <math>a_k</math>} identified from the previous step are then inserted one by one into initial <math>H^0</math> (<math>A^0</math>,<math>R^0</math>). The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.
  
 +
<math> \text{Info(H(A,R))= } \sum_{x<y; a_x , a_y \in A} d( a_x, a_y ) </math>
 +
** Minimum hierarchy evolution : The optimal hierarchy <math>H^{(i+1)}</math> introduces the least changes of information <math>H^i</math>. Optimize the following objective function
 +
***<math> obj_1= arg  min_{ H^{(i+1)} } ( \sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_i} d( a_x , a_y) )^2 </math>.
 +
** Minimum hierarchy discrepancy : A good heirarchy should bring least changes to initial hierarchy.
 +
 +
*** <math> obj_2 = arg  min_{ H^{(i+1)} } \frac {1} {i+1} (\sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_0} d( a_x , a_y) ))^2 </math>
 +
 +
** Minimum semantic  inconsistency : semantic distance estimated from hierarchy should be approximate to that calculated from feature function.
 +
***<math> obj_3 = arg min_{ H^{(i+1)} }  \sum_{x<y ; a_x , a_y \in A_i \cup {a}} (d^H ( a_x, a_y ) - d( a_x , a_y ))^2</math>
 +
 +
Final objective function is defined using <math> obj_1 , obj_2, obj_3 </math>
 +
** <math> obj = arg  min_{ H^{(i+1)} } ( \lambda_1 \star obj_1 + \lambda_2 \star obj_2 + \lambda_3 \star obj_3 ) where \lambda_1 + \lambda_2 + \lambda_3 =1; and 0 \le \lambda_1, \lambda_2, \lambda_3 \le 1. </math>
 +
 +
Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.
 +
 +
* Implicit Aspect Identification
 +
The author assumes that implicit aspect reviews use same sentiment terms for same aspect [[paper:Su et al.,2008]]. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.
 +
 +
== Experiment Result ==
 +
 +
== Related Paper ==
 +
* [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1583583&tag=1 Ye and T.-S. Chua. Learning Object Models from Semi-structured Web Documents. IEEE Transactions on Knowledge and Data Engineering, 2006].
 +
*
 
== Study Plan ==
 
== Study Plan ==

Revision as of 20:20, 29 September 2012

Citation

 author    = {Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang, Tat-Seng Chua},
 title     = {Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews},
 booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
 month     = {July},
 year      = {2011},
 pages     = {140--150},

Online version

ACLWEB 2011

Summary

This paper propose to hierarchically organize consumer reviews according to an aspect hierarchy, so as to transfer the reviews into a useful knowledge structure. This paper develops a domain-assisted approach to generate an aspect hierarchy by integrating domain knowledge and consumer reviews. They use external hierarchies such as Wordnet and Open Directory Project to learn semantic distance between aspects. They use aspect hierarchy to identify implicit aspects.

Dataset

Corpus.png

Background

An aspect hierarchy is defined as a tree that consists of a set of unique aspects A and a set of parent-child relations R between these aspects.

Methodology

Given the consumer reviews of a product, let A = {, · · · , } denotes the product aspects commented in the reviews. (,) denotes the initial hierarchy derived from domain knowledge. It contains a set of aspects and relations . Our task is to construct an aspect hierarchy H(A,R), to cover all the aspects in A and their parent-child relations R, so that the consumer reviews are hierarchically organized.

  • Initial Hierarchy Acquisition

Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by Ye and Chua (2006).

  • Aspect Identification in Customer Reviews

The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.

  • Semantic Distance Learning

The authors develop the following semantic distance metric.

    • Linguistic Features
    • Semantic Distance Learning
  • Aspect Hierarchy Generation

Aspects, A = {, · · · , } identified from the previous step are then inserted one by one into initial (,). The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.

    • Minimum hierarchy evolution : The optimal hierarchy introduces the least changes of information . Optimize the following objective function
      • .
    • Minimum hierarchy discrepancy : A good heirarchy should bring least changes to initial hierarchy.
    • Minimum semantic inconsistency : semantic distance estimated from hierarchy should be approximate to that calculated from feature function.

Final objective function is defined using

Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.

  • Implicit Aspect Identification

The author assumes that implicit aspect reviews use same sentiment terms for same aspect paper:Su et al.,2008. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.

Experiment Result

Related Paper

Study Plan