Difference between revisions of "Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews"

From Cohen Courses
Jump to navigationJump to search
 
(29 intermediate revisions by the same user not shown)
Line 13: Line 13:
  
 
== Summary ==
 
== Summary ==
This [[category::paper]] proposes to hierarchically organize consumer reviews according to an aspect hierarchy. An aspect is a product feature or attribute that concern the users. For example, an Iphone has following aspects: battery, OS, size, look & feel, weight, camera, memory, applications etc. Therefore, given the consumer reviews of a product, if A = {<math>a_1</math>, · · · , <math>a_k</math>} denotes the product aspects commented in the reviews. And <math>H^0</math> (<math>A^0</math>,<math>R^0</math>) denotes the initial hierarchy derived from domain knowledge where <math>A^0</math> is the initial set of aspects and <math>R^0</math> is the relations between them. The first objective of this paper is to construct an aspect hierarchy H(A,R), that covers all the aspects in A and their parent-child relations R. Secondly, cluster the review under aspects. Finally, identify the implicit aspects from product reviews and cluster them under respective aspects.
+
This [[category::paper]] provides an approach to [[AddressesProblem::Opinion_mining|opinion mining]] and [[AddressesProblem::Information_organization|organizing]] the extracted information. The authors propose to hierarchically organize consumer reviews according to an aspect hierarchy. An aspect is a product feature or attribute that concern the users. For example, an Iphone has following aspects: battery, OS, size, look & feel, weight, camera, memory, applications etc. Therefore, given the consumer reviews of a product, if <math>A=</math>{<math>a_1</math>,· · · ,<math>a_k</math>} denotes the product aspects commented in the reviews. And <math>H^0</math> <math>(A^0 , R^0)</math> denotes the initial hierarchy derived from domain knowledge where <math>A^0</math> is the initial set of aspects and <math>R^0</math> is the relations between them. The first objective of this paper is to construct an aspect hierarchy <math>H(A,R)</math>, that covers all the aspects in <math>A</math> and their parent-child relations <math>R</math>. Secondly, cluster the review under aspects. Finally, identify the implicit aspects from product reviews and cluster them under respective aspects.
  
 
== Dataset ==
 
== Dataset ==
 +
[[File:corpus.png|500px|thumb|alt=left|Table.1 Statistics of the reviews corpus, # denotes the size of the reviews/sentences]]
 +
[[File:ExternalHierarchies.png|thumb|Table.2: Statistics of External Hierarchies]]
 +
 
The corpus is crawled by authors from the prevalent forums such as [http://www.cnet.com cnet.com], [http://www.viewpoints.com viewpoints.com], [http://www.reevoo.com reevoo.com] and [http://www.gsmarena.com gsmarena.com]. It contains 11 products in four domain as shown in table 1. The initial aspect hierarchy was made gold standard with the help of human annotators.
 
The corpus is crawled by authors from the prevalent forums such as [http://www.cnet.com cnet.com], [http://www.viewpoints.com viewpoints.com], [http://www.reevoo.com reevoo.com] and [http://www.gsmarena.com gsmarena.com]. It contains 11 products in four domain as shown in table 1. The initial aspect hierarchy was made gold standard with the help of human annotators.
  
For semantic learning they have collected 50 hierarchies from WordNet and ODP as shown in table 2.
+
For semantic learning they have collected 50 hierarchies from [[UsesDataset::Dataset:Wordnet | Wordnet]] and [[UsesDataset::Dataset:ODP | ODP]] as shown in table 2.
 
 
[[File:corpus.png]] [[File:ExternalHierarchies.png]]
 
  
 
== Background ==
 
== Background ==
Line 27: Line 28:
 
== Methodology ==
 
== Methodology ==
  
The proposed approach has four components
+
The proposed approach has four components for aspect hierarchy generation.
  
 
1.'''Initial Hierarchy Acquisition :'''
 
1.'''Initial Hierarchy Acquisition :'''
  
Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by [[RelatedPaper::Ye and Chua (2006)]].
+
Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by [[RelatedPaper::Ye and Chua 2006]].
  
 
2.'''Aspect Identification in Customer Reviews :'''
 
2.'''Aspect Identification in Customer Reviews :'''
  
The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.
+
The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class [http://en.wikipedia.org/wiki/Support_vector_machine SVM] classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.
  
 
3.'''Semantic Distance Learning :''' Use the following semantic distance metric to measure the distance between two aspects, <math>a_x, a_y</math>.
 
3.'''Semantic Distance Learning :''' Use the following semantic distance metric to measure the distance between two aspects, <math>a_x, a_y</math>.
 +
<math> d( a_x , a_y ) = \sum_{j} w_j f_j (a_x , a_y) </math>. Where <math>f_j</math> is <math>j^{th}</math> feature function. The features and their functions are defined as follows:
 +
*''Linguistic Features :''
 +
**''Contextual feature :'' KL-divergence score between unigram language model of two aspects.
 +
***''Global contextual feature :'' The language model is build on document containing the aspect.
 +
***''Local contextual feature :'' The language model is build using only two words from each side of the aspect.
 +
**''Co-occurrence feature :'' It is Pointwise Mutual Information score. It can be built at document level, sentence level or using Google document count.
 +
**''Syntactic feature :'' Average distance between two aspects in a syntactic tree built using Stanford parser.
 +
**''Pattern feature :'' It's 1, if the two aspects match any of the 46 patterns. 40 part-of relations [[RelatedPaper::Girju et al., 2006]] and 6 hypernym relations [[RelatedPaper::Hearst, 1992]].
 +
**''Lexical feature :'' Length difference feature, difference in aspect word length. Definition overlap feature, count of word overlapping in Google definitions of aspects.
  
<math> d( a_x , a_y ) = \sum_{j} w_j f_j (a_x , a_y) </math>. f is a linguistic feature function that provides feature score for two aspects.
+
The weight parameters in the previous equation are learnt using the following optimization problem:
  
**''Linguistic Features :''
+
<math> arg min_w || d - f^T w ||^2 + \eta ||w||^2 </math>
***''Contextual feature :'' KL-divergence score between unigram language model of two aspects.
 
****''Global contextual feature :'' The language model is build on document containing the aspect.
 
****''Local contextual feature :'' The language model is build using only two words from each side of the aspect.
 
***''Co-occurrence feature :'' It is Pointwise Mutual Information score. It can be built at document level, sentence level or using Google document count.
 
***''Syntactic feature :'' Average distance between two aspects in a syntactic tree built using Stanford parser.
 
***''Pattern feature :'' It is 1 if the two aspects match any of the 46 patterns. 40 part-of relations [[Girju et al., 2006]] and 6 hypernym relations [[Hearts, 1992]].
 
***''Lexical feature :'' Length difference feature, difference in aspect word length. Definition overlap feature, count of word overlapping in Google definitions of aspects.
 
**''Semantic Distance Learning''
 
 
 
***The semantic distance metric is obtained by solving the following optimization problem
 
  
<math> arg min_w || d - f^T w ||^2 + \eta ||w||^2 </math>  
+
where vector d is the ground-truth distance of all the aspect pairs. And the ground-truth distance between two aspects is generated by summing up all the edge distances along the shortest path between <math>a_x  and  a_y</math>, in the initial hierarchy and every edge weight is assumed as 1.
  
where vector d is the ground-truth distance of all the aspect pairs. f is the feature vector for a pair. <math>\eta</math> is the tradeoff parameter.
+
f is the feature vector for corresponding pair. <math>\eta</math> is the tradeoff parameter.
  
The optimal solution for w in the above equation is defined as follows
+
The optimal solution for w in the above equation is defined as
  
 
<math> w^{\star} = (f^T f + \eta  I ) ^{-1} ( f^T d ) </math>
 
<math> w^{\star} = (f^T f + \eta  I ) ^{-1} ( f^T d ) </math>
  
 
The above learning algorithm can perform well when sufficient training data is available. Since the initial hierarchy are too coarse the author uses the WordNet and OpenDirectory Project hierarchies to learn  
 
The above learning algorithm can perform well when sufficient training data is available. Since the initial hierarchy are too coarse the author uses the WordNet and OpenDirectory Project hierarchies to learn  
<math> w_0 </math>. And <math>m_0</math> is used to assist learning the optimal distance metric from initial hierarchy. This can be represented as the following problem.
+
<math> w_0 </math>. And <math>w_0</math> is used to assist learning the optimal distance metric from initial hierarchy. This can be represented as the following problem.
  
 
<math> w^{\star} = (f^T f + (\eta + \gamma ) I ) ^{-1} ( f^T d + \gamma w_0) </math>
 
<math> w^{\star} = (f^T f + (\eta + \gamma ) I ) ^{-1} ( f^T d + \gamma w_0) </math>
  
Where <math> \eta and \gamma </math> are tradeoff parameters.
+
Where <math> \eta</math> and <math>\gamma</math> are tradeoff parameters.
  
 
4.'''Aspect Hierarchy Generation'''
 
4.'''Aspect Hierarchy Generation'''
Aspects, A = {<math>a_1</math>, · · · , <math>a_k</math>} identified from the previous step are then inserted one by one into initial <math>H^0</math> (<math>A^0</math>,<math>R^0</math>). The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.
+
Aspects, A = {<math>a_1</math>, · · · , <math>a_k</math>} identified from the step 2 are then inserted one by one into initial <math>H^0</math> (<math>A^0</math>,<math>R^0</math>).
  
<math> \text{Info(H(A,R))= } \sum_{x<y; a_x , a_y \in A} d( a_x, a_y ) </math>
+
The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.
** Minimum hierarchy evolution : The optimal hierarchy <math>H^{(i+1)}</math> introduces the least changes of information <math>H^i</math>. Optimize the following objective function
+
  <math> \text{Info(H(A,R))= } \sum_{x<y; a_x , a_y \in A} d( a_x, a_y ) </math>.
***<math> obj_1= arg  min_{ H^{(i+1)} } ( \sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_i} d( a_x , a_y) )^2 </math>.
 
** Minimum hierarchy discrepancy : A good hierarchy should bring least changes to initial hierarchy.
 
  
*** <math> obj_2 = arg  min_{ H^{(i+1)} } \frac {1} {i+1} (\sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_0} d( a_x , a_y) ))^2 </math>
+
i.''Minimum hierarchy evolution :'' The optimal hierarchy <math>H^{(i+1)}</math> introduces the least changes of information <math>H^i</math>. Optimize the following objective function
 +
<math>obj_1 = arg  min_{ H^{(i+1)} } ( \sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_i} d( a_x , a_y) )^2 </math>.
 +
ii.''Minimum hierarchy discrepancy :'' A good hierarchy should bring least changes to initial hierarchy.
 +
<math>obj_2 = arg  min_{ H^{(i+1)} } \frac {1} {i+1} (\sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_0} d( a_x , a_y) ))^2 </math>
 +
iii.''Minimum semantic  inconsistency :'' Semantic distance estimated from hierarchy should be approximate to that calculated from feature function.
 +
<math>obj_3 = arg min_{ H^{(i+1)} }  \sum_{x<y ; a_x , a_y \in A_i \cup {a}} (d^H ( a_x, a_y ) - d( a_x , a_y ))^2</math>
  
** Minimum semantic  inconsistency : semantic distance estimated from hierarchy should be approximate to that calculated from feature function.
+
Final objective function is defined using <math> obj_1 , obj_2, obj_3 </math>
***<math> obj_3 = arg min_{ H^{(i+1)} } \sum_{x<y ; a_x , a_y \in A_i \cup {a}} (d^H ( a_x, a_y ) - d( a_x , a_y ))^2</math>
+
<math> obj = arg min_{ H^{(i+1)} } ( \lambda_1 \star obj_1 + \lambda_2 \star obj_2 + \lambda_3 \star obj_3 )</math> where <math>\lambda_1 + \lambda_2 + \lambda_3 =1;</math> and <math>0 \le \lambda_1, \lambda_2, \lambda_3 \le 1</math>.
  
Final objective function is defined using <math> obj_1 , obj_2, obj_3 </math>
+
 
** <math> obj = arg  min_{ H^{(i+1)} } ( \lambda_1 \star obj_1 + \lambda_2 \star obj_2 + \lambda_3 \star obj_3 ) where \lambda_1 + \lambda_2 + \lambda_3 =1; and 0 \le \lambda_1, \lambda_2, \lambda_3 \le 1. </math>
+
'''Review Organization'''
  
 
Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.
 
Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.
  
* Implicit Aspect Identification  
+
''Implicit Aspect Identification :''
The author assumes that implicit aspect reviews use same sentiment terms for same aspect [[paper:Su et al.,2008]]. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.
+
The author assumes that implicit aspect reviews use same sentiment terms for same aspect [[RelatedPaper::Su et al, 2008]]. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.
  
 
== Experiment Result ==
 
== Experiment Result ==
 
*Aspect Identification
 
*Aspect Identification
**The proposed approach significantly outperforms state of art, [[Hu and Liu, 2004]] and [[Wu et al., 2009]], work in terms of <math>F_1-measure</math> by 5.87% and 3.27% respectively.
+
**The proposed approach significantly outperforms state of art, [[RelatedPaper::RelatedPaper::Hu and Liu, 2004]] and [[RelatedPaper::Wu et al., 2009]], work in terms of <math>F_1-measure</math> by 5.87% and 3.27% respectively.
 
*Aspect Hierarchy
 
*Aspect Hierarchy
**The results show that pattern-based, [[Hearst, 1992]], and clustering-based,[[Shi et al., 2008]] methods perform poor. The proposed method leverages external hierarchies to derive reliable semantic distance between aspects and thus outperforms [[Show et al., 2006]] and [[Yang and Callan 2009]].  
+
**The results show that pattern-based, [[RelatedPaper::Hearst, 1992]], and clustering-based,[[RelatedPaper::Shi et al., 2008]] methods perform poor. The proposed method leverages external hierarchies to derive reliable semantic distance between aspects and thus outperforms [[RelatedPaper::Snow et al., 2006]] and [[RelatedPaper::Yang and Callan 2009]].  
 
**Using initial hierarchy the proposed approach outperforms pattern-based, clustering-based, Snow's and Yang's method by 49.4%, 51.2%, 34.3% and 4.7% respectively.
 
**Using initial hierarchy the proposed approach outperforms pattern-based, clustering-based, Snow's and Yang's method by 49.4%, 51.2%, 34.3% and 4.7% respectively.
 
**Domain knowledge is important in aspect hierarchy generation as it is seen that <math>F_1-measure</math> increases with larger size of initial hierarchy.
 
**Domain knowledge is important in aspect hierarchy generation as it is seen that <math>F_1-measure</math> increases with larger size of initial hierarchy.
Line 99: Line 102:
 
**All the features and external hierarchies are important. External features boost <math>F_1-measure</math> by 2.81%.
 
**All the features and external hierarchies are important. External features boost <math>F_1-measure</math> by 2.81%.
 
*Implicit Aspect Identification
 
*Implicit Aspect Identification
**The authors have used mutual clustering, [[Su et al, 2008]], as the base line and shown that the proposed approach is 9.18% better in terms of average <math>F_1-measure</math>.
+
**The authors have used mutual clustering, [[RelatedPaper::Su et al, 2008]], as the base line and shown that the proposed approach is 9.18% better in terms of average <math>F_1-measure</math>.
  
 
== Related Paper ==
 
== Related Paper ==
* [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1583583&tag=1 Ye and T.-S. Chua. Learning Object Models from Semi-structured Web Documents. IEEE Transactions on Knowledge and Data Engineering, 2006].
+
* [[RelatedPaper::Ye and Chua 2006]].
 +
* [[RelatedPaper::Hu and Liu, 2004]]
 +
* [[Girju et al., 2006]]
 +
* [[Hearst, 1992]]
 +
 
 +
== Study Plan ==
 +
* [[RelatedPaper::Ye and Chua 2006]].
 
** Learn how to create aspect hierarchy by parsing information from webpages.
 
** Learn how to create aspect hierarchy by parsing information from webpages.
*  
+
* Learn about [[Method::Support_Vector_Machines| SVM]] classifier as it is used to test candidate noun-phrases in aspect identification process.
== Study Plan ==
+
* [[Girju et al., 2006]]
 +
** Learn about part-of relations patterns.
 +
* [[Hearst, 1992]]
 +
**Learn about hypernym relations patterns.
 +
* The user should know basic linear algebra to understand the optimization functions used in this paper.

Latest revision as of 19:22, 3 October 2012

Citation

 author    = {Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang, Tat-Seng Chua},
 title     = {Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews},
 booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
 month     = {July},
 year      = {2011},
 pages     = {140--150},

Online version

ACLWEB 2011

Summary

This paper provides an approach to opinion mining and organizing the extracted information. The authors propose to hierarchically organize consumer reviews according to an aspect hierarchy. An aspect is a product feature or attribute that concern the users. For example, an Iphone has following aspects: battery, OS, size, look & feel, weight, camera, memory, applications etc. Therefore, given the consumer reviews of a product, if {,· · · ,} denotes the product aspects commented in the reviews. And denotes the initial hierarchy derived from domain knowledge where is the initial set of aspects and is the relations between them. The first objective of this paper is to construct an aspect hierarchy , that covers all the aspects in and their parent-child relations . Secondly, cluster the review under aspects. Finally, identify the implicit aspects from product reviews and cluster them under respective aspects.

Dataset

left
Table.1 Statistics of the reviews corpus, # denotes the size of the reviews/sentences
Table.2: Statistics of External Hierarchies

The corpus is crawled by authors from the prevalent forums such as cnet.com, viewpoints.com, reevoo.com and gsmarena.com. It contains 11 products in four domain as shown in table 1. The initial aspect hierarchy was made gold standard with the help of human annotators.

For semantic learning they have collected 50 hierarchies from Wordnet and ODP as shown in table 2.

Background

An aspect hierarchy is defined as a tree that consists of a set of unique aspects A = {, · · · , } and a set of parent-child relations R between these aspects.

Methodology

The proposed approach has four components for aspect hierarchy generation.

1.Initial Hierarchy Acquisition :

Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by Ye and Chua 2006.

2.Aspect Identification in Customer Reviews :

The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.

3.Semantic Distance Learning : Use the following semantic distance metric to measure the distance between two aspects, . . Where is feature function. The features and their functions are defined as follows:

  • Linguistic Features :
    • Contextual feature : KL-divergence score between unigram language model of two aspects.
      • Global contextual feature : The language model is build on document containing the aspect.
      • Local contextual feature : The language model is build using only two words from each side of the aspect.
    • Co-occurrence feature : It is Pointwise Mutual Information score. It can be built at document level, sentence level or using Google document count.
    • Syntactic feature : Average distance between two aspects in a syntactic tree built using Stanford parser.
    • Pattern feature : It's 1, if the two aspects match any of the 46 patterns. 40 part-of relations Girju et al., 2006 and 6 hypernym relations Hearst, 1992.
    • Lexical feature : Length difference feature, difference in aspect word length. Definition overlap feature, count of word overlapping in Google definitions of aspects.

The weight parameters in the previous equation are learnt using the following optimization problem:

where vector d is the ground-truth distance of all the aspect pairs. And the ground-truth distance between two aspects is generated by summing up all the edge distances along the shortest path between , in the initial hierarchy and every edge weight is assumed as 1.

f is the feature vector for corresponding pair. is the tradeoff parameter.

The optimal solution for w in the above equation is defined as

The above learning algorithm can perform well when sufficient training data is available. Since the initial hierarchy are too coarse the author uses the WordNet and OpenDirectory Project hierarchies to learn . And is used to assist learning the optimal distance metric from initial hierarchy. This can be represented as the following problem.

Where and are tradeoff parameters.

4.Aspect Hierarchy Generation Aspects, A = {, · · · , } identified from the step 2 are then inserted one by one into initial (,).

The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.

 .

i.Minimum hierarchy evolution : The optimal hierarchy introduces the least changes of information . Optimize the following objective function

.

ii.Minimum hierarchy discrepancy : A good hierarchy should bring least changes to initial hierarchy.


iii.Minimum semantic inconsistency : Semantic distance estimated from hierarchy should be approximate to that calculated from feature function.


Final objective function is defined using

 where  and .


Review Organization

Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.

Implicit Aspect Identification : The author assumes that implicit aspect reviews use same sentiment terms for same aspect Su et al, 2008. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.

Experiment Result

  • Aspect Identification
  • Aspect Hierarchy
    • The results show that pattern-based, Hearst, 1992, and clustering-based,Shi et al., 2008 methods perform poor. The proposed method leverages external hierarchies to derive reliable semantic distance between aspects and thus outperforms Snow et al., 2006 and Yang and Callan 2009.
    • Using initial hierarchy the proposed approach outperforms pattern-based, clustering-based, Snow's and Yang's method by 49.4%, 51.2%, 34.3% and 4.7% respectively.
    • Domain knowledge is important in aspect hierarchy generation as it is seen that increases with larger size of initial hierarchy.
    • All three optimization criteria are important.
    • All the features and external hierarchies are important. External features boost by 2.81%.
  • Implicit Aspect Identification
    • The authors have used mutual clustering, Su et al, 2008, as the base line and shown that the proposed approach is 9.18% better in terms of average .

Related Paper

Study Plan

  • Ye and Chua 2006.
    • Learn how to create aspect hierarchy by parsing information from webpages.
  • Learn about SVM classifier as it is used to test candidate noun-phrases in aspect identification process.
  • Girju et al., 2006
    • Learn about part-of relations patterns.
  • Hearst, 1992
    • Learn about hypernym relations patterns.
  • The user should know basic linear algebra to understand the optimization functions used in this paper.