Semantic Role Labeling as Sequential Tagging

From Cohen Courses
Revision as of 18:23, 1 October 2011 by Manajs (talk | contribs) (→‎Dataset)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search


Luis Marquez, Pere Comas, Jesus Gimenez, Neus Catala, "Semantic Role Labeling as Sequential Tagging", CoNLL 2005

Online version

Click here to download


This paper aims at Semantic Role Labeling of sentences by modeling the problem as a sequential BIO-tagging problem. It makes use of Boosting, particularly Ada-Boost, with fixed depth decision trees. The authors implement two separate systems for SRL using syntactic features at different levels. The first system, uses shallow syntactic features from phrase-chunks and clauses (generated using UPC chunker and clauser), while the other system uses deep syntactic features obtained from (Charniak's) syntactic parse trees. The performance of SRL is evaluated first separately for these systems, and then on a combination of these systems.


The dataset used was the PropBank corpus, which is the Penn Treebank corpus with semantic role annotation.


As the chunked or parsed data is hierarchical in nature, the authors first pre-processed the data to sequentialize it. Sequentialization on chunked data was done by selecting the top level chunks for each clause identified. Sequentialization on parsed data was done by selecting sibling nodes of the predicate (verb) node along the path from the predicate node to the root node of the syntactic parse tree. The F-scores for this sequentialization process were 97.79 and 94.91 for the chunk-sequentialization and the parse-sequentialization respectively.
The nodes or chunks selected after sequentialization were then tagged with BIO labels for falling at the beginning, inside or outside of a semantic argument respectively. A total of 37 semantic argument types (or semantic roles) were considered and therefore a total of 32*2+1=75 labels were used for labeling the sequential data.
Learning algorithm was then applied to this labeled training data. Ada-boost was used with fixed depth decision trees. The decision trees had a maximum depth of 4, and used syntactic features as we shall see in the next section. The problem was modeled as an OVA classification and Ada-boost was thus used for binary classification of the syntactic constituents (chunks for and parse-tree nodes for ). Additional constraints like BIO structure and non-overlapping of the arguments were applied to the classification task.

Features Used

The following syntactic features, categorized under 4 categories were used (additional details can be obtained from the references cited in the paper):

(1) On the verb predicate:
Form; Lemma; POS tag; Chunk type and Type of verb phrase in which verb is included: single-word or multi-word; Verb voice: active, passive, copulative, infinitive, or progressive; Binary flag indicating if the verb is a start/end of a clause.
Subcategorization, i.e., the phrase structure rule expanding the verb parent node.

(2) On the focus constituent:
Type; Head: extracted using common head-word rules; if the first element is a PP chunk, then the head of the first NP is extracted;
First and last words and POS tags of the constituent.
POS sequence: if it is less than 5 tags long; 2/3/4-grams`of the POS sequence. Bag-of-words of nouns, adjectives, and adverbs in the constituent.
TOP sequence: sequence of types of the top-most syntactic elements in the constituent (if it is less than 5 elements long); in the case of full parsing this corresponds to the right-hand side of the rule expanding the constituent node; 2/3/4-grams of the TOP sequence.
Governing category as described in (Gildea and Jurafsky,2002).
NamedEnt, indicating if the constituent embeds or strictly-matches a named entity along with its type.
TMP, indicating if the constituent embeds or strictly matches a temporal keyword (extracted from AM-TMP arguments of the training set).

(3) Context of the focus constituent:
Previous and following words and POS tags of the constituent.
The same features characterizing focus constituents are extracted for the two previous and following tokens, provided they are inside the clause boundaries of the codified region.

(4) Relation between predicate and constituent:
Relative position; Distance in words and chunks; Level of embedding with respect to the constituent: in number of clauses.
Constituent path as described in (Gildea and Jurafsky,2002); All 3/4/5-grams of path constituents beginning at the verb predicate or ending at the constituent.
Partial parsing path as described in (Carreras et al., 2004); All 3/4/5-grams of path elements beginning at the verb predicate or ending at the constituent.
Syntactic frame as described by Xue and Palmer (2004)

Experiments and Results

As the training data, class and feature spaces were huge, the authors employed some filtering and simplification. First, infrequently-occurring labels were discarded and the 41 most frequent labels in the case of and the 35 most frequent in the case of were selected. The remaining labels where cumulatively tagged “other”, and were treated as "O" constituent whenever the system assigned this label to a constituent. Second, those features occurring less than 15 times in the training set were discarded. The final number of features came down to 105,175 in the case of system and 80,742 in the case of system.

The results obtained by these individual systems, and also their combined variant, on the development set, is presented below in Fig-1 ("Perfect Props" denotes the accuracy in finding the correct predicate or the verb). It's evident that a deep-parsing based system outperformed the shallow-parsing one, but what's noticeable is the fact that the latter performed competitively. The authors found that the arguments predicted by these individual systems were quite different, and this made them combine the two systems so as to get the best out of each of them. The final results of the combined system are shown in Fig-2.