Extracting Opinion Expressions with semi-Markov Conditional Random Fields

Citation

 author    = {Yang, Bishan  and  Cardie, Claire},
 title     = {Extracting Opinion Expressions with semi-Markov Conditional Random Fields},
 booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
 month     = {July},
 year      = {2012},
 address   = {Jeju Island, Korea},
 publisher = {Association for Computational Linguistics},
 pages     = {1335--1345},

Online version

ACLWEB 2012

Summary

This paper proposes an opinion expression extraction algorithm. It uses a segment level sequence labeling technique using semi-CRFs. The main focus of the paper is to identify two types of opinion expressions in the corpus. First, direct subjective expressions. Secondly, direct expressive subjective expressions. For Exmaple :

"The International Committee of the Red Cross, [as usual $]_{ESE}$ ,[has refused to make any statements $]_{DSE}$ ".
"The Chief Minister [said $]_{DSE}$ that [the demon they have reared will eat up their own vitals $]_{ESE}$ ".

Dataset

The algorithm is evaluated on Multi Perspective Question Answering (MPQA) dataset. The dataset contains 535 news articles and 11,114 sentences with 55.89% sentences with DSEs and 57.93% with ESEs. 135 documents are used for training and 400 are used for testing.

Background

The previous work of sequence tagging in natural language processing has been limited to word level. This paper extends it to phrase level. Sarawagi and Cohen, 2004 has shown that semi-CRFs outperform CRFs in Named Entity Recognition. So the authors try a new extended semi-CRF model to opinion expression extraction task and measure it's performance with semi-CRF and CRFs for the same task.

Methodology

Semi-CRF

A sentence s is divided into segments $<s_{1},...,s_{n}>$ . Where $s_{i}$ = $(t_{i},u_{i},y_{i})$ such that $t_{i}$ is the start position of segment $s_{i}$ , $u_{i}$ is the end position and $y_{i}$ is the label of the segment. Segment length is limited to maximum length seen in the corpus.

Feature function g(x,s,i) is short representation of $g(s,t_{i},u_{i},y_{i},y_{i-1})$ . The conditional probability of a segmentation s give a sequence x is defined as $p(s|x)={\frac {1}{\sum _{s^{'}\in S}exp{\sum _{i}\sum _{k}\lambda _{k}g_{k}(i,x,s^{'})}}}exp{\sum _{i}\sum _{k}\lambda _{k}g_{k}(i,x,s)}$ .

The correct segmentation s of a sentence is defined as a sequence of entity segments(DSE or ESE) and non-entity segments (they are unit length segments that are to be ignored).

Extended Semi-CRF for Opinion Expression Extraction

Fig.1 Sentence parse tree

Fig.2 Segment Construction Algorithm

The objective is to learn the entity boundaries and labels for opinion expression extraction.

First modification, the segment length should not be fixed to maximum segment length based on observed entities, it should be unbounded to allow any length segment candidates.
Second,the segment units are generated from sentence parse tree,Fig.1.

Segment candidates are constructed using the segment construction algorithm Fig.2. Function $commGroup(U_{i},...,U_{j})$ returns true if parent node of $U_{i},...,U_{j}$ have the same rightmost child in their subtrees, otherwise it returns false.

The above generated candidate segments are then validated as follows.

For opinion expressions that do not match any segment candidate, break them down into smaller segment using a greedy match process.
- Starting from start position of the expression, search for the longest candidate that is contained in expression, add it to the correct segmentation for the sentence.
- Move the start position to the next position and repeat previous.

The segment training data generated from previous step is then used in training semi-CRF model. The authors have used BFGS algorithm,Liu and Nocedal, 1989,for optimizing gradient of log-likelihood L.

${\frac {dL}{d\lambda {_{k}}}}=\sum _{i}g_{k}(x,t_{i},u_{i},y_{i},y_{i-1})-\sum _{s^{'}\in S}\sum _{y,y^{'}}\sum _{j}g_{k}(x,t_{j}^{'},u_{j}^{'},y,y^{'})p(y,y^{'}|x)$

where, S is all possible segmentation consisting of generated segment candidates. $p(y,y^{'}|x)$ is probability of having label y for the current segment $s_{j}^{'}$ ( with boundary $(t_{j}^{'},u_{j}^{'}))$ and $y^{'}$ is label for previous segment $s_{j-1}^{'}$ .

Forward-Backward algorithm is used to compute marginal distribution $p(y,y^{'}|x)$ and normalization factor Z(x). For inference best segmentation is arg $max_{s}$ p(s|x). An efficient inference is implemented by extending Viterbi algorithm to segments.

Features

There are two set of features. First, CRF-style features, string representation of word, its part-of-speech and strong or weak subjectivity feature derived from subjectivity lexicon provided by Wilson et al. 2005. Breck et al.2007 have used similar definition in their work.

Secondly, following segment-level syntactic features are used to capture syntactic patterns of opinion expressions. Since most of the opinion expression involved verb phrases(VP),therefore following features are used to capture VP-related constituents.

VPRoot: A VP constituent whose parent node is not VP.
VPLeaf: A VP constituent whose children nodes are non-VP.
- VPcluster: Indicates whether or not segment matches verb-cluster strucutre.
- VPpred: A feature of the syntactic category and the word of the head of VPLeaf. The head of VPLeaf is the predicate of the verb phrase, which may encode some intention of opinions in the verb phrase.
- VParg: A feature syntactic category and the head word of the argument in VPLeaf.
- VPsubj:Whether the verb clusters or the argument in the segment contains an entry from the subjectivity lexicon.

Experimental Results

Evaluation Metrics

Binary Overlap : Predicted expression is correct if it overlaps with a correct expression.
Proportional Overlap : Only the overlapping proportion of predicted expression over correct expression is consider to be correct.

Baseline Methods

Token-level CRF approach, Breck et al.2007, is used as the baseline on MPQA dataset.
Two variation of standard CRF are used. First, segment-CRF, treats segment units obtained from parser as work tokens. Second, Syntactic-CRF, encodes segment-level syntactic information in a standard token-level CRF as input features.
Semi-CRF model ,Sarawagi and Cohen, 2004, is also used as baseline.

Results

The extended semi-CRF is labeled as new-semi-CRF.

Comparison with previous work.

Discussion

The extended semi-CRF approach outperforms original semi-CRF,Sarawagi and Cohen, 2004. But as compared to CRF it has lower precision and high recall. This is because the current approach predicted nearly twice the number of DSEs as compared to CRF and this lead to high recall and low precision. Overall the F-measure is boosted as compared to CRF.

The authors propose to add new features and better way to model context surrounding to improve performance. One should note that the semi-CRFs take longer time to train and validate then CRFs. The proposed approach took 2.25 hours for training 11,114 sentences. 2 hours for parsing sentences using Stanford Parser and 15 minutes training on a 4GB RAM, Intel Core 2 Duo CPU.

Study Plan

Learn Semi-CRF model,Sarawagi and Cohen, 2004.
- John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of ICML ’01.pdf
  - Conditional Random Fields
- Learn Forward-backward algorithm.
- Learn Viterbi algorithm.
Learn BFGS algorithm, Liu and Nocedal, 1989.
Learn polarity identification defined in Wilson et al. 2005.