Difference between revisions of "Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews"

Revision as of 00:50, 30 September 2012

Citation

 author    = {Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang, Tat-Seng Chua},
 title     = {Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews},
 booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
 month     = {July},
 year      = {2011},
 pages     = {140--150},

Online version

ACLWEB 2011

Summary

This paper propose to hierarchically organize consumer reviews according to an aspect hierarchy.

Given the consumer reviews of a product, if A = { $a_{1}$ , · · · , $a_{k}$ } denotes the product aspects commented in the reviews. $H^{0}$ ( $A^{0}$ , $R^{0}$ ) denotes the initial hierarchy derived from domain knowledge where $A^{0}$ is the initial set of aspects and $R^{0}$ is the relations between them. The first objective of this paper is to construct an aspect hierarchy H(A,R), that covers all the aspects in A and their parent-child relations R. Secondly, cluster the review under aspects. Finally, identify the implicit aspects from product reviews and cluster them under respective aspects.

Dataset

The corpus is crawled by authors from the prevalent forums such as cnet.com, viewpoints.com, reevoo.com and gsmarena.com. It contains 11 products in four domain as shown in table 1. The initial aspect hierarchy was made gold standard with the help of human annotators.

For semantic learning they have collected 50 hierarchies from WordNet and ODP as shown in table 2.

Background

An aspect hierarchy is de?ned as a tree that consists of a set of unique aspects A = { $a_{1}$ , · · · , $a_{k}$ } and a set of parent-child relations R between these aspects.

Methodology

The proposed approach has four components

1.Initial Hierarchy Acquisition :

Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by Ye and Chua (2006).

2.Aspect Identification in Customer Reviews :

The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.

3.Semantic Distance Learning : Use the following semantic distance metric to measure the distance between two aspects, $a_{x},a_{y}$ .

$d(a_{x},a_{y})=\sum _{j}w_{j}f_{j}(a_{x},a_{y})$ . f is a linguistic feature function that provides feature score for two aspects.

- Linguistic Features :
  - Contextual feature : KL-divergence score between unigram language model of two aspects.
    - Global contextual feature : The language model is build on document containing the aspect.
    - Local contextual feature : The language model is build using only two words from each side of the aspect.
  - Co-occurrence feature : It is Pointwise Mutual Information score. It can be built at document level, sentence level or using Google document count.
  - Syntactic feature : Average distance between two aspects in a syntactic tree built using Stanford parser.
  - Pattern feature : It is 1 if the two aspects match any of the 46 patterns. 40 part-of relations Girju et al., 2006 and 6 hypernym relations Hearts, 1992.
  - Lexical feature : Length difference feature, difference in aspect word length. Definition overlap feature, count of word overlapping in Google definitions of aspects.
- Semantic Distance Learning

- - The semantic distance metric is obtained by solving the following optimization problem

$argmin_{w}||d-f^{T}w||^{2}+\eta ||w||^{2}$

where vector d is the ground-truth distance of all the aspect pairs. f is the feature vector for a pair. $\eta$ is the tradeoff parameter.

The optimal solution for w in the above equation is defined as follows

$w^{\star }=(f^{T}f+\eta I)^{-1}(f^{T}d)$

The above learning algorithm can perform well when sufficient training data is available. Since the initial hierarchy are too coarse the author uses the WordNet and OpenDirectory Project hierarchies to learn $w_{0}$ . And $m_{0}$ is used to assist learning the optimal distance metric from initial hierarchy. This can be represented as the following problem.

$w^{\star }=(f^{T}f+(\eta +\gamma )I)^{-1}(f^{T}d+\gamma w_{0})$

Where $\eta and\gamma$ are tradeoff parameters.

4.Aspect Hierarchy Generation Aspects, A = { $a_{1}$ , · · · , $a_{k}$ } identified from the previous step are then inserted one by one into initial $H^{0}$ ( $A^{0}$ , $R^{0}$ ). The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.

${\text{Info(H(A,R))= }}\sum _{x<y;a_{x},a_{y}\in A}d(a_{x},a_{y})$

- Minimum hierarchy evolution : The optimal hierarchy $H^{(i+1)}$ $H^{(i+1)}$ introduces the least changes of information $H^{i}$ $H^{i}$ . Optimize the following objective function
  - $obj_{1}=argmin_{H^{(i+1)}}(\sum _{x<y;a_{x},a_{y}\in A_{i}\cup {a}}d(a_{x},a_{y})-\sum _{x<y;a_{x},a_{y}\in A_{i}}d(a_{x},a_{y}))^{2}$ .
- Minimum hierarchy discrepancy : A good hierarchy should bring least changes to initial hierarchy.

- - $obj_{2}=argmin_{H^{(i+1)}}{\frac {1}{i+1}}(\sum _{x<y;a_{x},a_{y}\in A_{i}\cup {a}}d(a_{x},a_{y})-\sum _{x<y;a_{x},a_{y}\in A_{0}}d(a_{x},a_{y})))^{2}$

- Minimum semantic inconsistency : semantic distance estimated from hierarchy should be approximate to that calculated from feature function.
  - $obj_{3}=argmin_{H^{(i+1)}}\sum _{x<y;a_{x},a_{y}\in A_{i}\cup {a}}(d^{H}(a_{x},a_{y})-d(a_{x},a_{y}))^{2}$

Final objective function is defined using $obj_{1},obj_{2},obj_{3}$

- $obj=argmin_{H^{(i+1)}}(\lambda _{1}\star obj_{1}+\lambda _{2}\star obj_{2}+\lambda _{3}\star obj_{3})where\lambda _{1}+\lambda _{2}+\lambda _{3}=1;and0\leq \lambda _{1},\lambda _{2},\lambda _{3}\leq 1.$

Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.

Implicit Aspect Identification

The author assumes that implicit aspect reviews use same sentiment terms for same aspect paper:Su et al.,2008. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.

Experiment Result

Aspect Identification
- The proposed approach significantly outperforms state of art, Hu and Liu, 2004 and Wu et al., 2009, work in terms of $F_{1}-measure$ by 5.87% and 3.27% respectively.
Aspect Hierarchy
- The results show that pattern-based, Hearst, 1992, and clustering-based,Shi et al., 2008 methods perform poor. The proposed method leverages external hierarchies to derive reliable semantic distance between aspects and thus outperforms Show et al., 2006 and Yang and Callan 2009.
- Using initial hierarchy the proposed approach outperforms pattern-based, clustering-based, Snow's and Yang's method by 49.4%, 51.2%, 34.3% and 4.7% respectively.
- Domain knowledge is important in aspect hierarchy generation as it is seen that $F_{1}-measure$ increases with larger size of initial hierarchy.
- All three optimization criteria are important.
- All the features and external hierarchies are important. External features boost $F_{1}-measure$ by 2.81%.
Implicit Aspect Identification
- The authors have used mutual clustering, Su et al, 2008, as the base line and shown that the proposed approach is 9.18% better in terms of average $F_{1}-measure$ .

Related Paper

Ye and T.-S. Chua. Learning Object Models from Semi-structured Web Documents. IEEE Transactions on Knowledge and Data Engineering, 2006.
- Learn how to create aspect hierarchy by parsing information from webpages.

@@ Line 13: / Line 13: @@
 == Summary ==
-This [[category::paper]] propose to hierarchically organize consumer reviews according to an aspect hierarchy, so as to transfer the reviews into a useful knowledge structure. This paper develops a domain-assisted approach to generate an aspect hierarchy by integrating domain knowledge and consumer reviews. They use external hierarchies such as WordNet and Open Directory Project to learn semantic distance between aspects. They use aspect hierarchy to identify implicit aspects.
+This [[category::paper]] propose to hierarchically organize consumer reviews according to an aspect hierarchy.
+Given the consumer reviews of a product, if A = {<math>a_1</math>, · · · , <math>a_k</math>} denotes the product aspects commented in the reviews. <math>H^0</math> (<math>A^0</math>,<math>R^0</math>) denotes the initial hierarchy derived from domain knowledge where <math>A^0</math> is the initial set of aspects and <math>R^0</math> is the relations between them. The first objective of this paper is to construct an aspect hierarchy H(A,R), that covers all the aspects in A and their parent-child relations R. Secondly, cluster the review under aspects. Finally, identify the implicit aspects from product reviews and cluster them under respective aspects.
 == Dataset ==
-The corpus is crawled by authors from the prevalent forum such as cnet.com, viewpoints.com, reevoo.com and gsmarena.com. There are 11 products in four domains. The initial hierarchy was made gold standard by using human annotators.
+The corpus is crawled by authors from the prevalent forums such as cnet.com, viewpoints.com, reevoo.com and gsmarena.com. It contains 11 products in four domain as shown in table 1. The initial aspect hierarchy was made gold standard with the help of human annotators.
-[[File:corpus.png]]
+For semantic learning they have collected 50 hierarchies from WordNet and ODP as shown in table 2.
-For semantic learning 50 hierarchies are taken from WordNet and ODP.
+[[File:corpus.png]] [[File:ExternalHierarchies.png]]
-[[File:ExternalHierarchies.png]]
 == Background ==
-An aspect hierarchy is deﬁned as a tree that consists of a set of unique aspects A and a set of parent-child relations R between these aspects.
+An aspect hierarchy is de?ned as a tree that consists of a set of unique aspects A = {<math>a_1</math>, · · · , <math>a_k</math>}  and a set of parent-child relations R between these aspects.
 == Methodology ==
-Given the consumer reviews of a product, let A = {<math>a_1</math>, · · · , <math>a_k</math>} denotes the product aspects commented in the reviews. <math>H^0</math> (<math>A^0</math>,<math>R^0</math>) denotes the initial hierarchy derived from domain knowledge. It contains a set of aspects <math>A^0</math> and relations <math>R^0</math>. Our task is to construct an aspect hierarchy H(A,R), to cover all the aspects in A and their parent-child relations R, so that the consumer reviews are hierarchically organized.
-* Initial Hierarchy Acquisition
+The proposed approach has four components
+.'''Initial Hierarchy Acquisition :'''
 Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by [[RelatedPaper::Ye and Chua (2006)]].
-* Aspect Identification in Customer Reviews
+.'''Aspect Identification in Customer Reviews :'''
 The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.
-* Semantic Distance Learning
-The authors develop the following semantic distance metric.
+.'''Semantic Distance Learning :''' Use the following semantic distance metric to measure the distance between two aspects, <math>a_x, a_y</math>.
-** Linguistic Features
-** Semantic Distance Learning
+<math> d( a_x , a_y ) = \sum_{j} w_j f_j (a_x , a_y) </math>. f is a linguistic feature function that provides feature score for two aspects.
-* Aspect Hierarchy Generation
+**''Linguistic Features :''
+***''Contextual feature :'' KL-divergence score between unigram language model of two aspects.
+****''Global contextual feature :'' The language model is build on document containing the aspect.
+****''Local contextual feature :'' The language model is build using only two words from each side of the aspect.
+***''Co-occurrence feature :'' It is Pointwise Mutual Information score. It can be built at document level, sentence level or using Google document count.
+***''Syntactic feature :'' Average distance between two aspects in a syntactic tree built using Stanford parser.
+***''Pattern feature :'' It is 1 if the two aspects match any of the 46 patterns. 40 part-of relations [[Girju et al., 2006]] and 6 hypernym relations [[Hearts, 1992]].
+***''Lexical feature :'' Length difference feature, difference in aspect word length. Definition overlap feature, count of word overlapping in Google definitions of aspects.
+**''Semantic Distance Learning''
+***The semantic distance metric is obtained by solving the following optimization problem
+<math> arg min_w || d - f^T w ||^2 + \eta ||w||^2 </math>
+where vector d is the ground-truth distance of all the aspect pairs. f is the feature vector for a pair. <math>\eta</math> is the tradeoff parameter.
+The optimal solution for w in the above equation is defined as follows
+<math> w^{\star} = (f^T f + \eta  I ) ^{-1} ( f^T d ) </math>
+The above learning algorithm can perform well when sufficient training data is available. Since the initial hierarchy are too coarse the author uses the WordNet and OpenDirectory Project hierarchies to learn
+<math> w_0 </math>. And <math>m_0</math> is used to assist learning the optimal distance metric from initial hierarchy. This can be represented as the following problem.
+<math> w^{\star} = (f^T f + (\eta + \gamma ) I ) ^{-1} ( f^T d + \gamma w_0) </math>
+Where <math> \eta and \gamma </math> are tradeoff parameters.
+.'''Aspect Hierarchy Generation'''
 Aspects, A = {<math>a_1</math>, · · · , <math>a_k</math>} identified from the previous step are then inserted one by one into initial <math>H^0</math> (<math>A^0</math>,<math>R^0</math>). The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.
@@ Line 43: / Line 76: @@
 ** Minimum hierarchy evolution : The optimal hierarchy <math>H^{(i+1)}</math> introduces the least changes of information <math>H^i</math>. Optimize the following objective function
 ***<math> obj_1= arg  min_{ H^{(i+1)} } ( \sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_i} d( a_x , a_y) )^2 </math>.
-** Minimum hierarchy discrepancy : A good heirarchy should bring least changes to initial hierarchy.
+** Minimum hierarchy discrepancy : A good hierarchy should bring least changes to initial hierarchy.
 *** <math> obj_2 = arg  min_{ H^{(i+1)} } \frac {1} {i+1} (\sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_0} d( a_x , a_y) ))^2 </math>
@@ Line 55: / Line 88: @@
 Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.
-* Implicit Aspect Identiﬁcation
+* Implicit Aspect Identification
 The author assumes that implicit aspect reviews use same sentiment terms for same aspect [[paper:Su et al.,2008]]. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.
 == Experiment Result ==
+*Aspect Identification
+**The proposed approach significantly outperforms state of art, [[Hu and Liu, 2004]] and [[Wu et al., 2009]], work in terms of <math>F_1-measure</math> by 5.87% and 3.27% respectively.
+*Aspect Hierarchy
+**The results show that pattern-based, [[Hearst, 1992]], and clustering-based,[[Shi et al., 2008]] methods perform poor. The proposed method leverages external hierarchies to derive reliable semantic distance between aspects and thus outperforms [[Show et al., 2006]] and [[Yang and Callan 2009]].
+**Using initial hierarchy the proposed approach outperforms pattern-based, clustering-based, Snow's and Yang's method by 49.4%, 51.2%, 34.3% and 4.7% respectively.
+**Domain knowledge is important in aspect hierarchy generation as it is seen that <math>F_1-measure</math> increases with larger size of initial hierarchy.
+**All three optimization criteria are important.
+**All the features and external hierarchies are important. External features boost <math>F_1-measure</math> by 2.81%.
+*Implicit Aspect Identification
+**The authors have used mutual clustering, [[Su et al, 2008]], as the base line and shown that the proposed approach is 9.18% better in terms of average <math>F_1-measure</math>.
 == Related Paper ==
 * [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1583583&tag=1 Ye and T.-S. Chua. Learning Object Models from Semi-structured Web Documents. IEEE Transactions on Knowledge and Data Engineering, 2006].
+** Learn how to create aspect hierarchy by parsing information from webpages.
 *
 == Study Plan ==

Difference between revisions of "Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews"

Revision as of 00:50, 30 September 2012

Contents

Citation

Online version

Summary

Dataset

Background

Methodology

Experiment Result

Related Paper

Study Plan

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools