Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews

From Cohen Courses
Revision as of 21:32, 29 September 2012 by Ydalal (talk | contribs)
Jump to navigationJump to search

Citation

 author    = {Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang, Tat-Seng Chua},
 title     = {Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews},
 booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
 month     = {July},
 year      = {2011},
 pages     = {140--150},

Online version

ACLWEB 2011

Summary

This paper propose to hierarchically organize consumer reviews according to an aspect hierarchy, so as to transfer the reviews into a useful knowledge structure. This paper develops a domain-assisted approach to generate an aspect hierarchy by integrating domain knowledge and consumer reviews. They use external hierarchies such as WordNet and Open Directory Project to learn semantic distance between aspects. They use aspect hierarchy to identify implicit aspects.

Dataset

The corpus is crawled by authors from the prevalent forum such as cnet.com, viewpoints.com, reevoo.com and gsmarena.com. There are 11 products in four domains. The initial hierarchy was made gold standard by using human annotators. Corpus.png For semantic learning 50 hierarchies are taken from WordNet and ODP. ExternalHierarchies.png

Background

An aspect hierarchy is defined as a tree that consists of a set of unique aspects A and a set of parent-child relations R between these aspects.

Methodology

Given the consumer reviews of a product, let A = {, · · · , } denotes the product aspects commented in the reviews. (,) denotes the initial hierarchy derived from domain knowledge. It contains a set of aspects and relations . Our task is to construct an aspect hierarchy H(A,R), to cover all the aspects in A and their parent-child relations R, so that the consumer reviews are hierarchically organized.

  • Initial Hierarchy Acquisition

Product aspects are extracted from web documents and an initial aspect hierarchy is generated using the approach described by Ye and Chua (2006).

  • Aspect Identification in Customer Reviews

The authors assume that noun phrases are good candidates for aspects. Therefore they leverage the pros and con reviews ( contains explicit product pros and cons description) by extracting noun phrases from them and use them as the training data for a single class SVM classifier. This classifier is then used to test the noun phrases extracted from candidate customer reviews.

  • Semantic Distance Learning

The authors develop the following semantic distance metric.

    • Linguistic Features
    • Semantic Distance Learning
  • Aspect Hierarchy Generation

Aspects, A = {, · · · , } identified from the previous step are then inserted one by one into initial (,). The insertion is done considering the following information function and set of rules for optimizing the resulting hierarchy.

    • Minimum hierarchy evolution : The optimal hierarchy introduces the least changes of information . Optimize the following objective function
      • .
    • Minimum hierarchy discrepancy : A good heirarchy should bring least changes to initial hierarchy.
    • Minimum semantic inconsistency : semantic distance estimated from hierarchy should be approximate to that calculated from feature function.

Final objective function is defined using

Based on the final hierarchy the customer reviews are organized under their corresponding aspect. The aspect nodes are pruned and sentiment classification is done on reviews under given aspect.

  • Implicit Aspect Identification

The author assumes that implicit aspect reviews use same sentiment terms for same aspect paper:Su et al.,2008. Therefore a customer review is represented by a vector of sentiment terms. Following this calculate the average feature vector for each aspect and then allocate each implicit aspect review to its nearest aspect node.

Experiment Result

Related Paper

Study Plan