Difference between revisions of "Extracting Opinion Expressions with semi-Markov Conditional Random Fields"

Revision as of 23:01, 1 October 2012

Citation

 author    = {Yang, Bishan  and  Cardie, Claire},
 title     = {Extracting Opinion Expressions with semi-Markov Conditional Random Fields},
 booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
 month     = {July},
 year      = {2012},
 address   = {Jeju Island, Korea},
 publisher = {Association for Computational Linguistics},
 pages     = {1335--1345},

Online version

ACLWEB 2012

Summary

This paper proposes a segment level sequence labeling technique using semi-CRFs. The main focus of the paper is to identify two types of opinion expressions in the corpus. First, direct subjective expressions. Secondly, direct expressive subjective expressions. For Exmaple :

"The International Committee of the Red Cross, [as usual $]_{ESE}$ ,[has refused to make any statements $]_{DSE}$ ".
"The Chief Minister [said $]_{DSE}$ that [the demon they have reared will eat up their own vitals $]_{ESE}$ ".

Dataset

MPQA 1.2 corpus, Wiebe et al.,2005 is used. It contains 535 news articles and 11,114 sentences with 55.89% sentences with DSEs and 57.93% with ESEs. 135 documents are used for training and 400 are used for testing.

Background

The previous work of sequence tagging in natural language processing has been limited to token level. T

Methodology

Semi-CRF

A sentence s is divided into segments $<s_{1},...,s_{n}>$ . Where $s_{i}$ = $(t_{i},u_{i},y_{i})$ such that $t_{i}$ is the start position of segment $s_{i}$ , $u_{i}$ is the end position and $y_{i}$ is the label of the segment. Segment length is limited to maximum length seen in the corpus. Feature function : $g(x,s,i)=g(s,t_{i},u_{i},y_{i},y_{i-1})$ . The conditional probability of a segmentation s give a sequence x is defined as $p(s|x)={\frac {1}{\sum _{s^{'}\in S}exp{\sum _{i}\sum _{k}\lambda _{k}g_{k}(i,x,s^{'})}}}exp{\sum _{i}\sum _{k}\lambda _{k}g_{k}(i,x,s)}$ .

The correct segmentation s of a sentence is defined as a sequence of entity segments(DSE or ESE) and non-entity segments (they are unit length segments that are to be ignored).

Extended Semi-CRF for Opinion Expression Extraction

The objective is to learn the entity boundaries and labels for opinion expression extraction.

First modification, the segment length should not be fixed to maximum segment length based on observed entities, it should be unbounded to allow any length segment candidates.
Second,the segment units are generated from sentence parse tree.

Segment Construction Algorithm

Function $commGroup(U_{i},...,U_{j})$ returns true if parent node of $U_{i},...,U_{j}$ have the same rightmost child in their subtrees, otherwise it returns false. The above generated candidate segments are then validated using.

TBD

Features:

Experimental Results

Token-level CRF-based approach, Breck et al.2007 is used as the baseline on MPQA dataset.

Study Plan

This paper uses semi-CRF for the labeling task. So the user should first read about semi-CRF.

Sunita Sarawagi, William W. Cohen Semi-Markov Conditional Random Fields for Information Extraction.

@@ Line 29: / Line 29: @@
 == Methodology ==
 === Semi-CRF ===
-A sentence s is divided into segments <math> <s_1,...,s_n> </math>. Where <math> s_i</math> = <math>( t_i, u_i, y_i )</math> such that <math>t_i </math> is the start position of segment <math>s_i</math> and <math>u_i</math> is the end position of the sentence <math> s_i</math>,<math> y_i </math> is the label of the segment. Segment length is limited to maximum length seen in the corpus. Feature function :
+A sentence s is divided into segments <math> <s_1,...,s_n> </math>. Where <math> s_i</math> = <math>( t_i, u_i, y_i )</math> such that <math>t_i </math> is the start position of segment <math>s_i</math>,  <math>u_i</math> is the end position and <math> y_i </math> is the label of the segment. Segment length is limited to maximum length seen in the corpus. Feature function :
 <math> g(x,s,i) = g(s,t_i,u_i,y_i,y_{i-1})</math>.
 The conditional probability of a segmentation s give a sequence x is defined as

Difference between revisions of "Extracting Opinion Expressions with semi-Markov Conditional Random Fields"

Revision as of 23:01, 1 October 2012

Contents

Citation

Online version

Summary

Dataset

Background

Methodology

Semi-CRF

Extended Semi-CRF for Opinion Expression Extraction

Experimental Results

Study Plan

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools