Difference between revisions of "Semantic Role Labeling as Sequential Tagging"

From Cohen Courses
Jump to navigationJump to search
 
(4 intermediate revisions by the same user not shown)
Line 7: Line 7:
  
 
== Introduction ==
 
== Introduction ==
This [[Category::paper]] aims at [[Semantic_Role_Labeling::Semantic Role Labeling]] of sentences by modeling the problem as a sequential BIO-tagging problem. It makes use of [[Boosting|Ada-Boost]] with fixed depth decision trees. The authors implement two separate systems for SRL using syntactic features at different levels. The first system, <math>PP{_U}{_P}{_C}</math> uses shallow syntactic features from phrase-chunks and clauses (generated using UPC chunker and clauser), while the other system <math>FP{_C}{_H}{_A}</math> uses deep syntactic features obtained from (Charniak's) syntactic parse trees. The performance of SRL is evaluated first separately for these systems, and then on a combination of these systems.
+
This [[Category::paper]] aims at [[AddressesProblem::Semantic Role Labeling]] of sentences by modeling the problem as a sequential BIO-tagging problem. It makes use of [[UsesMethod::Boosting]], particularly [[Boosting|Ada-Boost]], with fixed depth decision trees. The authors implement two separate systems for SRL using syntactic features at different levels. The first system, <math>PP{_U}{_P}{_C}</math> uses shallow syntactic features from phrase-chunks and clauses (generated using UPC chunker and clauser), while the other system <math>FP{_C}{_H}{_A}</math> uses deep syntactic features obtained from (Charniak's) syntactic parse trees. The performance of SRL is evaluated first separately for these systems, and then on a combination of these systems.
  
 
==Dataset==
 
==Dataset==
The dataset used was the [[PropBank|Propbank]] corpus, which is the Penn Treebank corpus with semantic role annotation.
+
The dataset used was the [[UsesDataset::PropBank]] corpus, which is the Penn Treebank corpus with semantic role annotation.
  
 
==Methodology==
 
==Methodology==
Line 36: Line 36:
 
2004); All <b>3/4/5-grams</b> of path elements beginning at the verb predicate or ending at the constituent. <br><b>Syntactic frame</b> as described by Xue and Palmer (2004)</p>
 
2004); All <b>3/4/5-grams</b> of path elements beginning at the verb predicate or ending at the constituent. <br><b>Syntactic frame</b> as described by Xue and Palmer (2004)</p>
  
==Dataset Used==
+
==Experiments and Results==
The dataset used was the Propbank corpus, which is the Penn Treebank corpus with semantic role annotation.
+
As the training data, class and feature spaces were huge, the authors employed some filtering and simplification. First, infrequently-occurring labels were discarded and the 41 most frequent labels in the case of <math>PP{_U}{_P}{_C}</math> and the 35 most frequent in the case of <math>FP{_C}{_H}{_A}</math> were selected. The remaining labels where cumulatively tagged “other”, and were treated as "O" constituent whenever the system assigned this label to a constituent. Second, those features occurring less than 15 times in the training set were discarded.  The final number of features came down to 105,175 in the case of <math>PP{_U}{_P}{_C}</math> system and 80,742 in the case of <math>PP{_U}{_P}{_C}</math> system.<br>
  
==CRF Model==
+
The results obtained by these individual systems, and also their combined variant, on the development set, is presented below in Fig-1 ("Perfect Props" denotes the accuracy in finding the correct predicate or the verb). It's evident that a deep-parsing based system outperformed the shallow-parsing one, but what's noticeable is the fact that the latter performed competitively. The authors found that the arguments predicted by these individual systems were quite different, and this made them combine the two systems so as to get the best out of each of them. The final results of the combined system are shown in Fig-2.<br>
The CRF was defined over the tree structure of the sentence as:<br>
+
[[File:fig1.jpg]]<br>
[[File:crf_coh.jpg]]
+
            Fig-1<br>
 
+
[[File:fig2.jpg]]<br>
where <math>C</math> is the set of cliques in the observation tree, <math>\lambda{_k}</math> are model's parameters, and <math>f</math> is the function that maps label for a clique to a vector of scalar values.<br>
+
            Fig-2<br>
The cliques considered were single-node (just one node in the syntactic tree), and two-node (parent and child nodes) ones. The CRF model can thus be restated as<br>
 
[[File:crf_coh_alt.jpg]]
 
 
 
where the actual feature function <math>f</math> is divided into single-node feature function <math>g</math>, and two-node feature function <math>h</math>.
 
 
 
==Features Used==
 
As the cliques considered are single-node and two-node cliques, the features were also defined for both single nodes and parent-child pairs. There were many syntactic features used; I will not be describing each of them, but will provide reference as also given in the paper for their description. The syntactic features or the feature types were made into binary functions <math>g</math> and <math>h</math> by combining (feature type, feature value) pairs with label (for a single node) or label pairs (for two-noded cliques), when such a feature-type, feature-value was seen at least once in the training data.<br>
 
The different feature types used were:<br>
 
<b>Basic features</b>: {Head word, head PoS, phrase syntactic category, phrase path, position relative to the predicate, surface distance to the predicate, predicate lemma, predicate token, predicate voice, predicate sub-categorisation, syntactic frame}. These features are common to many SRL systems and are described in Xue and Palmer (2004).<br>
 
<b>Context features</b>: {Head word of first NP in preposition phrase, left and right sibling head words and syntactic categories, first and last word in phrase yield and their PoS, parent syntactic category and head word}. These features are described in Pradhan et al (2005).<br>
 
<b>Common ancestor of the verb</b>: The syntactic category of the deepest shared ancestor of both the verb and node.<br>
 
<b>Feature conjunctions</b>: The following features were conjoined: { predicate lemma + syntactic category, predicate lemma + relative position, syntactic category + first word of the phrase}.<br>
 
<b>Default feature</b>: This feature is always on, which allows the classifier to model the prior probability distribution over the  possible argument labels.<br>
 
<b>Joint features</b>: These features were only defined over pair-wise cliques: {whether the parent and child head words do not match, parent syntactic category + and child syntactic category, parent relative position + child relative position, parent relative position + child relative position + predicate PoS + predicate lemma}.
 
 
 
==Experimental Results and Conclusion==
 
The parsed training data sentences yielded 90,388 predicates and 1,971,985 binary features (<math>g</math> and <math>h</math>). The experimental results of precision, recall and f-scores are shown in the table below.<br>
 
[[File:cohn_results.jpg]]
 
<br><br>Although the modeling of the problem is neat, the results reported were not at par with the best systems that competed in the CoNLL shared task. Marquez et. al. in their [[Semantic_Role_Labeling_as_Sequential_Tagging|paper]] showed that modeling the SRL problem as a sequential BIO-tagging problem still gives far better results. They made use of a combination of deep and shallow syntactic features and used boosting technique for the BIO-tagging.
 

Latest revision as of 19:23, 1 October 2011

Citation

Luis Marquez, Pere Comas, Jesus Gimenez, Neus Catala, "Semantic Role Labeling as Sequential Tagging", CoNLL 2005

Online version

Click here to download

Introduction

This paper aims at Semantic Role Labeling of sentences by modeling the problem as a sequential BIO-tagging problem. It makes use of Boosting, particularly Ada-Boost, with fixed depth decision trees. The authors implement two separate systems for SRL using syntactic features at different levels. The first system, uses shallow syntactic features from phrase-chunks and clauses (generated using UPC chunker and clauser), while the other system uses deep syntactic features obtained from (Charniak's) syntactic parse trees. The performance of SRL is evaluated first separately for these systems, and then on a combination of these systems.

Dataset

The dataset used was the PropBank corpus, which is the Penn Treebank corpus with semantic role annotation.

Methodology

As the chunked or parsed data is hierarchical in nature, the authors first pre-processed the data to sequentialize it. Sequentialization on chunked data was done by selecting the top level chunks for each clause identified. Sequentialization on parsed data was done by selecting sibling nodes of the predicate (verb) node along the path from the predicate node to the root node of the syntactic parse tree. The F-scores for this sequentialization process were 97.79 and 94.91 for the chunk-sequentialization and the parse-sequentialization respectively.
The nodes or chunks selected after sequentialization were then tagged with BIO labels for falling at the beginning, inside or outside of a semantic argument respectively. A total of 37 semantic argument types (or semantic roles) were considered and therefore a total of 32*2+1=75 labels were used for labeling the sequential data.
Learning algorithm was then applied to this labeled training data. Ada-boost was used with fixed depth decision trees. The decision trees had a maximum depth of 4, and used syntactic features as we shall see in the next section. The problem was modeled as an OVA classification and Ada-boost was thus used for binary classification of the syntactic constituents (chunks for and parse-tree nodes for ). Additional constraints like BIO structure and non-overlapping of the arguments were applied to the classification task.

Features Used

The following syntactic features, categorized under 4 categories were used (additional details can be obtained from the references cited in the paper):

(1) On the verb predicate:
Form; Lemma; POS tag; Chunk type and Type of verb phrase in which verb is included: single-word or multi-word; Verb voice: active, passive, copulative, infinitive, or progressive; Binary flag indicating if the verb is a start/end of a clause.
Subcategorization, i.e., the phrase structure rule expanding the verb parent node.


(2) On the focus constituent:
Type; Head: extracted using common head-word rules; if the first element is a PP chunk, then the head of the first NP is extracted;
First and last words and POS tags of the constituent.
POS sequence: if it is less than 5 tags long; 2/3/4-grams`of the POS sequence. Bag-of-words of nouns, adjectives, and adverbs in the constituent.
TOP sequence: sequence of types of the top-most syntactic elements in the constituent (if it is less than 5 elements long); in the case of full parsing this corresponds to the right-hand side of the rule expanding the constituent node; 2/3/4-grams of the TOP sequence.
Governing category as described in (Gildea and Jurafsky,2002).
NamedEnt, indicating if the constituent embeds or strictly-matches a named entity along with its type.
TMP, indicating if the constituent embeds or strictly matches a temporal keyword (extracted from AM-TMP arguments of the training set).


(3) Context of the focus constituent:
Previous and following words and POS tags of the constituent.
The same features characterizing focus constituents are extracted for the two previous and following tokens, provided they are inside the clause boundaries of the codified region.


(4) Relation between predicate and constituent:
Relative position; Distance in words and chunks; Level of embedding with respect to the constituent: in number of clauses.
Constituent path as described in (Gildea and Jurafsky,2002); All 3/4/5-grams of path constituents beginning at the verb predicate or ending at the constituent.
Partial parsing path as described in (Carreras et al., 2004); All 3/4/5-grams of path elements beginning at the verb predicate or ending at the constituent.
Syntactic frame as described by Xue and Palmer (2004)

Experiments and Results

As the training data, class and feature spaces were huge, the authors employed some filtering and simplification. First, infrequently-occurring labels were discarded and the 41 most frequent labels in the case of and the 35 most frequent in the case of were selected. The remaining labels where cumulatively tagged “other”, and were treated as "O" constituent whenever the system assigned this label to a constituent. Second, those features occurring less than 15 times in the training set were discarded. The final number of features came down to 105,175 in the case of system and 80,742 in the case of system.

The results obtained by these individual systems, and also their combined variant, on the development set, is presented below in Fig-1 ("Perfect Props" denotes the accuracy in finding the correct predicate or the verb). It's evident that a deep-parsing based system outperformed the shallow-parsing one, but what's noticeable is the fact that the latter performed competitively. The authors found that the arguments predicted by these individual systems were quite different, and this made them combine the two systems so as to get the best out of each of them. The final results of the combined system are shown in Fig-2.
Fig1.jpg

           Fig-1

Fig2.jpg

           Fig-2