Difference between revisions of "Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews"
(37 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
== Summary == | == Summary == | ||
− | This [[category::paper]] propose to hierarchically organize consumer reviews according to an aspect hierarchy, | + | This [[category::paper]] provides an approach to [[AddressesProblem::Opinion_mining|opinion mining]] and [[AddressesProblem::Information_organization|organizing]] the extracted information. The authors propose to hierarchically organize consumer reviews according to an aspect hierarchy. An aspect is a product feature or attribute that concern the users. For example, an Iphone has following aspects: battery, OS, size, look & feel, weight, camera, memory, applications etc. Therefore, given the consumer reviews of a product, if <math>A=</math>{<math>a_1</math>,· · · ,<math>a_k</math>} denotes the product aspects commented in the reviews. And <math>H^0</math> <math>(A^0 , R^0)</math> denotes the initial hierarchy derived from domain knowledge where <math>A^0</math> is the initial set of aspects and <math>R^0</math> is the relations between them. The first objective of this paper is to construct an aspect hierarchy <math>H(A,R)</math>, that covers all the aspects in <math>A</math> and their parent-child relations <math>R</math>. Secondly, cluster the review under aspects. Finally, identify the implicit aspects from product reviews and cluster them under respective aspects. |
+ | |||
== Dataset == | == Dataset == | ||
− | The corpus is crawled by authors from the prevalent | + | [[File:corpus.png|500px|thumb|alt=left|Table.1 Statistics of the reviews corpus, # denotes the size of the reviews/sentences]] |
− | + | [[File:ExternalHierarchies.png|thumb|Table.2: Statistics of External Hierarchies]] | |
− | For semantic learning 50 hierarchies | + | |
− | [[ | + | The corpus is crawled by authors from the prevalent forums such as [http://www.cnet.com cnet.com], [http://www.viewpoints.com viewpoints.com], [http://www.reevoo.com reevoo.com] and [http://www.gsmarena.com gsmarena.com]. It contains 11 products in four domain as shown in table 1. The initial aspect hierarchy was made gold standard with the help of human annotators. |
+ | |||
+ | For semantic learning they have collected 50 hierarchies from [[UsesDataset::Dataset:Wordnet | Wordnet]] and [[UsesDataset::Dataset:ODP | ODP]] as shown in table 2. | ||
== Background == | == Background == | ||
− | An aspect hierarchy is | + | An aspect hierarchy is defined as a tree that consists of a set of unique aspects A = {<math>a_1</math>, · · · , <math>a_k</math>} and a set of parent-child relations R between these aspects. |
== Methodology == | == Methodology == | ||
− | |||
− | + | The proposed approach has four components for aspect hierarchy generation. | |
− | Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by [[RelatedPaper::Ye and Chua | + | |
− | + | 1.'''Initial Hierarchy Acquisition :''' | |
− | The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews. | + | |
− | + | Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by [[RelatedPaper::Ye and Chua 2006]]. | |
− | + | ||
− | * | + | 2.'''Aspect Identification in Customer Reviews :''' |
− | ** | + | |
− | * | + | The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class [http://en.wikipedia.org/wiki/Support_vector_machine SVM] classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews. |
− | + | ||
+ | 3.'''Semantic Distance Learning :''' Use the following semantic distance metric to measure the distance between two aspects, <math>a_x, a_y</math>. | ||
+ | <math> d( a_x , a_y ) = \sum_{j} w_j f_j (a_x , a_y) </math>. Where <math>f_j</math> is <math>j^{th}</math> feature function. The features and their functions are defined as follows: | ||
+ | *''Linguistic Features :'' | ||
+ | **''Contextual feature :'' KL-divergence score between unigram language model of two aspects. | ||
+ | ***''Global contextual feature :'' The language model is build on document containing the aspect. | ||
+ | ***''Local contextual feature :'' The language model is build using only two words from each side of the aspect. | ||
+ | **''Co-occurrence feature :'' It is Pointwise Mutual Information score. It can be built at document level, sentence level or using Google document count. | ||
+ | **''Syntactic feature :'' Average distance between two aspects in a syntactic tree built using Stanford parser. | ||
+ | **''Pattern feature :'' It's 1, if the two aspects match any of the 46 patterns. 40 part-of relations [[RelatedPaper::Girju et al., 2006]] and 6 hypernym relations [[RelatedPaper::Hearst, 1992]]. | ||
+ | **''Lexical feature :'' Length difference feature, difference in aspect word length. Definition overlap feature, count of word overlapping in Google definitions of aspects. | ||
+ | |||
+ | The weight parameters in the previous equation are learnt using the following optimization problem: | ||
+ | |||
+ | <math> arg min_w || d - f^T w ||^2 + \eta ||w||^2 </math> | ||
+ | |||
+ | where vector d is the ground-truth distance of all the aspect pairs. And the ground-truth distance between two aspects is generated by summing up all the edge distances along the shortest path between <math>a_x and a_y</math>, in the initial hierarchy and every edge weight is assumed as 1. | ||
+ | |||
+ | f is the feature vector for corresponding pair. <math>\eta</math> is the tradeoff parameter. | ||
+ | |||
+ | The optimal solution for w in the above equation is defined as | ||
+ | |||
+ | <math> w^{\star} = (f^T f + \eta I ) ^{-1} ( f^T d ) </math> | ||
+ | |||
+ | The above learning algorithm can perform well when sufficient training data is available. Since the initial hierarchy are too coarse the author uses the WordNet and OpenDirectory Project hierarchies to learn | ||
+ | <math> w_0 </math>. And <math>w_0</math> is used to assist learning the optimal distance metric from initial hierarchy. This can be represented as the following problem. | ||
+ | |||
+ | <math> w^{\star} = (f^T f + (\eta + \gamma ) I ) ^{-1} ( f^T d + \gamma w_0) </math> | ||
− | <math> \ | + | Where <math> \eta</math> and <math>\gamma</math> are tradeoff parameters. |
− | |||
− | |||
− | |||
− | + | 4.'''Aspect Hierarchy Generation''' | |
+ | Aspects, A = {<math>a_1</math>, · · · , <math>a_k</math>} identified from the step 2 are then inserted one by one into initial <math>H^0</math> (<math>A^0</math>,<math>R^0</math>). | ||
− | + | The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy. | |
− | + | <math> \text{Info(H(A,R))= } \sum_{x<y; a_x , a_y \in A} d( a_x, a_y ) </math>. | |
+ | |||
+ | i.''Minimum hierarchy evolution :'' The optimal hierarchy <math>H^{(i+1)}</math> introduces the least changes of information <math>H^i</math>. Optimize the following objective function | ||
+ | <math>obj_1 = arg min_{ H^{(i+1)} } ( \sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_i} d( a_x , a_y) )^2 </math>. | ||
+ | ii.''Minimum hierarchy discrepancy :'' A good hierarchy should bring least changes to initial hierarchy. | ||
+ | <math>obj_2 = arg min_{ H^{(i+1)} } \frac {1} {i+1} (\sum_{x<y ; a_x , a_y \in A_i \cup {a}} d( a_x , a_y) - \sum_{x<y ; a_x , a_y \in A_0} d( a_x , a_y) ))^2 </math> | ||
+ | iii.''Minimum semantic inconsistency :'' Semantic distance estimated from hierarchy should be approximate to that calculated from feature function. | ||
+ | <math>obj_3 = arg min_{ H^{(i+1)} } \sum_{x<y ; a_x , a_y \in A_i \cup {a}} (d^H ( a_x, a_y ) - d( a_x , a_y ))^2</math> | ||
Final objective function is defined using <math> obj_1 , obj_2, obj_3 </math> | Final objective function is defined using <math> obj_1 , obj_2, obj_3 </math> | ||
− | + | <math> obj = arg min_{ H^{(i+1)} } ( \lambda_1 \star obj_1 + \lambda_2 \star obj_2 + \lambda_3 \star obj_3 )</math> where <math>\lambda_1 + \lambda_2 + \lambda_3 =1;</math> and <math>0 \le \lambda_1, \lambda_2, \lambda_3 \le 1</math>. | |
+ | |||
+ | |||
+ | '''Review Organization''' | ||
Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect. | Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect. | ||
− | + | ''Implicit Aspect Identification :'' | |
− | The author assumes that implicit aspect reviews use same sentiment terms for same aspect [[ | + | The author assumes that implicit aspect reviews use same sentiment terms for same aspect [[RelatedPaper::Su et al, 2008]]. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node. |
== Experiment Result == | == Experiment Result == | ||
+ | *Aspect Identification | ||
+ | **The proposed approach significantly outperforms state of art, [[RelatedPaper::RelatedPaper::Hu and Liu, 2004]] and [[RelatedPaper::Wu et al., 2009]], work in terms of <math>F_1-measure</math> by 5.87% and 3.27% respectively. | ||
+ | *Aspect Hierarchy | ||
+ | **The results show that pattern-based, [[RelatedPaper::Hearst, 1992]], and clustering-based,[[RelatedPaper::Shi et al., 2008]] methods perform poor. The proposed method leverages external hierarchies to derive reliable semantic distance between aspects and thus outperforms [[RelatedPaper::Snow et al., 2006]] and [[RelatedPaper::Yang and Callan 2009]]. | ||
+ | **Using initial hierarchy the proposed approach outperforms pattern-based, clustering-based, Snow's and Yang's method by 49.4%, 51.2%, 34.3% and 4.7% respectively. | ||
+ | **Domain knowledge is important in aspect hierarchy generation as it is seen that <math>F_1-measure</math> increases with larger size of initial hierarchy. | ||
+ | **All three optimization criteria are important. | ||
+ | **All the features and external hierarchies are important. External features boost <math>F_1-measure</math> by 2.81%. | ||
+ | *Implicit Aspect Identification | ||
+ | **The authors have used mutual clustering, [[RelatedPaper::Su et al, 2008]], as the base line and shown that the proposed approach is 9.18% better in terms of average <math>F_1-measure</math>. | ||
== Related Paper == | == Related Paper == | ||
− | * [ | + | * [[RelatedPaper::Ye and Chua 2006]]. |
− | * | + | * [[RelatedPaper::Hu and Liu, 2004]] |
+ | * [[Girju et al., 2006]] | ||
+ | * [[Hearst, 1992]] | ||
+ | |||
== Study Plan == | == Study Plan == | ||
+ | * [[RelatedPaper::Ye and Chua 2006]]. | ||
+ | ** Learn how to create aspect hierarchy by parsing information from webpages. | ||
+ | * Learn about [[Method::Support_Vector_Machines| SVM]] classifier as it is used to test candidate noun-phrases in aspect identification process. | ||
+ | * [[Girju et al., 2006]] | ||
+ | ** Learn about part-of relations patterns. | ||
+ | * [[Hearst, 1992]] | ||
+ | **Learn about hypernym relations patterns. | ||
+ | * The user should know basic linear algebra to understand the optimization functions used in this paper. |
Latest revision as of 19:22, 3 October 2012
Contents
Citation
author = {Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang, Tat-Seng Chua}, title = {Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews}, booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing}, month = {July}, year = {2011}, pages = {140--150},
Online version
Summary
This paper provides an approach to opinion mining and organizing the extracted information. The authors propose to hierarchically organize consumer reviews according to an aspect hierarchy. An aspect is a product feature or attribute that concern the users. For example, an Iphone has following aspects: battery, OS, size, look & feel, weight, camera, memory, applications etc. Therefore, given the consumer reviews of a product, if {,· · · ,} denotes the product aspects commented in the reviews. And denotes the initial hierarchy derived from domain knowledge where is the initial set of aspects and is the relations between them. The first objective of this paper is to construct an aspect hierarchy , that covers all the aspects in and their parent-child relations . Secondly, cluster the review under aspects. Finally, identify the implicit aspects from product reviews and cluster them under respective aspects.
Dataset
The corpus is crawled by authors from the prevalent forums such as cnet.com, viewpoints.com, reevoo.com and gsmarena.com. It contains 11 products in four domain as shown in table 1. The initial aspect hierarchy was made gold standard with the help of human annotators.
For semantic learning they have collected 50 hierarchies from Wordnet and ODP as shown in table 2.
Background
An aspect hierarchy is defined as a tree that consists of a set of unique aspects A = {, · · · , } and a set of parent-child relations R between these aspects.
Methodology
The proposed approach has four components for aspect hierarchy generation.
1.Initial Hierarchy Acquisition :
Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by Ye and Chua 2006.
2.Aspect Identification in Customer Reviews :
The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.
3.Semantic Distance Learning : Use the following semantic distance metric to measure the distance between two aspects, . . Where is feature function. The features and their functions are defined as follows:
- Linguistic Features :
- Contextual feature : KL-divergence score between unigram language model of two aspects.
- Global contextual feature : The language model is build on document containing the aspect.
- Local contextual feature : The language model is build using only two words from each side of the aspect.
- Co-occurrence feature : It is Pointwise Mutual Information score. It can be built at document level, sentence level or using Google document count.
- Syntactic feature : Average distance between two aspects in a syntactic tree built using Stanford parser.
- Pattern feature : It's 1, if the two aspects match any of the 46 patterns. 40 part-of relations Girju et al., 2006 and 6 hypernym relations Hearst, 1992.
- Lexical feature : Length difference feature, difference in aspect word length. Definition overlap feature, count of word overlapping in Google definitions of aspects.
- Contextual feature : KL-divergence score between unigram language model of two aspects.
The weight parameters in the previous equation are learnt using the following optimization problem:
where vector d is the ground-truth distance of all the aspect pairs. And the ground-truth distance between two aspects is generated by summing up all the edge distances along the shortest path between , in the initial hierarchy and every edge weight is assumed as 1.
f is the feature vector for corresponding pair. is the tradeoff parameter.
The optimal solution for w in the above equation is defined as
The above learning algorithm can perform well when sufficient training data is available. Since the initial hierarchy are too coarse the author uses the WordNet and OpenDirectory Project hierarchies to learn . And is used to assist learning the optimal distance metric from initial hierarchy. This can be represented as the following problem.
Where and are tradeoff parameters.
4.Aspect Hierarchy Generation Aspects, A = {, · · · , } identified from the step 2 are then inserted one by one into initial (,).
The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.
.
i.Minimum hierarchy evolution : The optimal hierarchy introduces the least changes of information . Optimize the following objective function
.
ii.Minimum hierarchy discrepancy : A good hierarchy should bring least changes to initial hierarchy.
iii.Minimum semantic inconsistency : Semantic distance estimated from hierarchy should be approximate to that calculated from feature function.
Final objective function is defined using
where and .
Review Organization
Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.
Implicit Aspect Identification : The author assumes that implicit aspect reviews use same sentiment terms for same aspect Su et al, 2008. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.
Experiment Result
- Aspect Identification
- The proposed approach significantly outperforms state of art, RelatedPaper::Hu and Liu, 2004 and Wu et al., 2009, work in terms of by 5.87% and 3.27% respectively.
- Aspect Hierarchy
- The results show that pattern-based, Hearst, 1992, and clustering-based,Shi et al., 2008 methods perform poor. The proposed method leverages external hierarchies to derive reliable semantic distance between aspects and thus outperforms Snow et al., 2006 and Yang and Callan 2009.
- Using initial hierarchy the proposed approach outperforms pattern-based, clustering-based, Snow's and Yang's method by 49.4%, 51.2%, 34.3% and 4.7% respectively.
- Domain knowledge is important in aspect hierarchy generation as it is seen that increases with larger size of initial hierarchy.
- All three optimization criteria are important.
- All the features and external hierarchies are important. External features boost by 2.81%.
- Implicit Aspect Identification
- The authors have used mutual clustering, Su et al, 2008, as the base line and shown that the proposed approach is 9.18% better in terms of average .
Related Paper
Study Plan
- Ye and Chua 2006.
- Learn how to create aspect hierarchy by parsing information from webpages.
- Learn about SVM classifier as it is used to test candidate noun-phrases in aspect identification process.
- Girju et al., 2006
- Learn about part-of relations patterns.
- Hearst, 1992
- Learn about hypernym relations patterns.
- The user should know basic linear algebra to understand the optimization functions used in this paper.