Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews
Contents
Citation
author = {Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang, Tat-Seng Chua}, title = {Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews}, booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing}, month = {July}, year = {2011}, pages = {140--150},
Online version
Summary
This paper propose to hierarchically organize consumer reviews according to an aspect hierarchy, so as to transfer the reviews into a useful knowledge structure. This paper develops a domain-assisted approach to generate an aspect hierarchy by integrating domain knowledge and consumer reviews. They use external hierarchies such as WordNet and Open Directory Project to learn semantic distance between aspects. They use aspect hierarchy to identify implicit aspects.
Dataset
The corpus is crawled by authors from the prevalent forum such as cnet.com, viewpoints.com, reevoo.com and gsmarena.com. There are 11 products in four domains. The initial hierarchy was made gold standard by using human annotators. For semantic learning 50 hierarchies are taken from WordNet and ODP.
Background
An aspect hierarchy is defined as a tree that consists of a set of unique aspects A and a set of parent-child relations R between these aspects.
Methodology
Given the consumer reviews of a product, let A = {, · · · , } denotes the product aspects commented in the reviews. (,) denotes the initial hierarchy derived from domain knowledge. It contains a set of aspects and relations . Our task is to construct an aspect hierarchy H(A,R), to cover all the aspects in A and their parent-child relations R, so that the consumer reviews are hierarchically organized.
- Initial Hierarchy Acquisition
Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by Ye and Chua (2006).
- Aspect Identification in Customer Reviews
The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.
- Semantic Distance Learning
The authors develop the following semantic distance metric.
- Linguistic Features
- Semantic Distance Learning
- Aspect Hierarchy Generation
Aspects, A = {, · · · , } identified from the previous step are then inserted one by one into initial (,). The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.
- Minimum hierarchy evolution : The optimal hierarchy introduces the least changes of information . Optimize the following objective function
- .
- Minimum hierarchy discrepancy : A good heirarchy should bring least changes to initial hierarchy.
- Minimum hierarchy evolution : The optimal hierarchy introduces the least changes of information . Optimize the following objective function
- Minimum semantic inconsistency : semantic distance estimated from hierarchy should be approximate to that calculated from feature function.
- Minimum semantic inconsistency : semantic distance estimated from hierarchy should be approximate to that calculated from feature function.
Final objective function is defined using
Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.
- Implicit Aspect Identification
The author assumes that implicit aspect reviews use same sentiment terms for same aspect paper:Su et al.,2008. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.