Semantic Role Labeling as Sequential Tagging

From Cohen Courses
Jump to navigationJump to search

Citation

Luis Marquez, Pere Comas, Jesus Gimenez, Neus Catala, "Semantic Role Labeling as Sequential Tagging", CoNLL 2005

Online version

Click here to download

Introduction

This paper aims at Semantic Role Labeling of sentences by modeling the problem as a sequential BIO-tagging problem. It makes use of Ada-Boost with fixed depth decision trees. The authors implement two separate systems for SRL using syntactic features at different levels. The first system, uses shallow syntactic features from phrase-chunks and clauses (generated using UPC chunker and clauser), while the other system uses deep syntactic features obtained from (Charniak's) syntactic parse trees. The performance of SRL is evaluated first separately for these systems, and then on a combination of these systems.

Dataset

The dataset used was the Propbank corpus, which is the Penn Treebank corpus with semantic role annotation.

Methodology

As the chunked or parsed data is hierarchical in nature, the authors first pre-processed the data to sequentialize it. Sequentialization on chunked data was done by selecting the top level chunks for each clause identified. Sequentialization on parsed data was done by selecting sibling nodes of the predicate (verb) node along the path from the predicate node to the root node of the syntactic parse tree. The F-scores for this sequentialization process were 97.79 and 94.91 for the chunk-sequentialization and the parse-sequentialization respectively.
The nodes or chunks selected after sequentialization were then tagged with BIO labels for falling at the beginning, inside or outside of a semantic argument respectively. A total of 37 semantic argument types (or semantic roles) were considered and therefore a total of 32*2+1=75 labels were used for labeling the sequential data.
Learning algorithm was then applied to this labeled training data. Ada-boost was used with fixed depth decision trees. The decision trees had a maximum depth of 4, and used syntactic features as we shall see in the next section. The problem was modeled as an OVA classification and Ada-boost was thus used for binary classification of the syntactic constituents (chunks for and parse-tree nodes for ). Additional constraints like BIO structure and non-overlapping of the arguments were applied to the classification task.

Features Used

The following syntactic features, categorized under 4 categories were used (additional details can be obtained from the references cited in the paper):

(1) On the verb predicate:
Form; Lemma; POS tag; Chunk type and Type of verb phrase in which verb is included: single-word or multi-word; Verb voice: active, passive, copulative, infinitive, or progressive; Binary flag indicating if the verb is a start/end of a clause.
Subcategorization, i.e., the phrase structure rule expanding the verb parent node.


(2) On the focus constituent:
Type; Head: extracted using common head-word rules; if the first element is a PP chunk, then the head of the first NP is extracted;
First and last words and POS tags of the constituent.
POS sequence: if it is less than 5 tags long; 2/3/4-grams`of the POS sequence. Bag-of-words of nouns, adjectives, and adverbs in the constituent.
TOP sequence: sequence of types of the top-most syntactic elements in the constituent (if it is less than 5 elements long); in the case of full parsing this corresponds to the right-hand side of the rule expanding the constituent node; 2/3/4-grams of the TOP sequence.
Governing category as described in (Gildea and Jurafsky,2002).
NamedEnt, indicating if the constituent embeds or strictly-matches a named entity along with its type.
TMP, indicating if the constituent embeds or strictly matches a temporal keyword (extracted from AM-TMP arguments of the training set).


(3)Context of the focus constituent:
Previous and following words and POS tags of the constituent.
The same features characterizing focus constituents are extracted for the two previous and following tokens, provided they are inside the clause boundaries of the codified region.


(4)Relation between predicate and constituent:
Relative position; Distance in words and chunks; Level of embedding with respect to the constituent: in number of clauses.
Constituent path as described in (Gildea and Jurafsky,2002); All 3/4/5-grams of path constituents beginning at the verb predicate or ending at the constituent.
Partial parsing path as described in (Carreras et al., 2004); All 3/4/5-grams of path elements beginning at the verb predicate or ending at the constituent.
Syntactic frame as described by Xue and Palmer (2004)

Dataset Used

The dataset used was the Propbank corpus, which is the Penn Treebank corpus with semantic role annotation.

CRF Model

The CRF was defined over the tree structure of the sentence as:
Crf coh.jpg

where is the set of cliques in the observation tree, are model's parameters, and is the function that maps label for a clique to a vector of scalar values.
The cliques considered were single-node (just one node in the syntactic tree), and two-node (parent and child nodes) ones. The CRF model can thus be restated as
Crf coh alt.jpg

where the actual feature function is divided into single-node feature function , and two-node feature function .

Features Used

As the cliques considered are single-node and two-node cliques, the features were also defined for both single nodes and parent-child pairs. There were many syntactic features used; I will not be describing each of them, but will provide reference as also given in the paper for their description. The syntactic features or the feature types were made into binary functions and by combining (feature type, feature value) pairs with label (for a single node) or label pairs (for two-noded cliques), when such a feature-type, feature-value was seen at least once in the training data.
The different feature types used were:
Basic features: {Head word, head PoS, phrase syntactic category, phrase path, position relative to the predicate, surface distance to the predicate, predicate lemma, predicate token, predicate voice, predicate sub-categorisation, syntactic frame}. These features are common to many SRL systems and are described in Xue and Palmer (2004).
Context features: {Head word of first NP in preposition phrase, left and right sibling head words and syntactic categories, first and last word in phrase yield and their PoS, parent syntactic category and head word}. These features are described in Pradhan et al (2005).
Common ancestor of the verb: The syntactic category of the deepest shared ancestor of both the verb and node.
Feature conjunctions: The following features were conjoined: { predicate lemma + syntactic category, predicate lemma + relative position, syntactic category + first word of the phrase}.
Default feature: This feature is always on, which allows the classifier to model the prior probability distribution over the possible argument labels.
Joint features: These features were only defined over pair-wise cliques: {whether the parent and child head words do not match, parent syntactic category + and child syntactic category, parent relative position + child relative position, parent relative position + child relative position + predicate PoS + predicate lemma}.

Experimental Results and Conclusion

The parsed training data sentences yielded 90,388 predicates and 1,971,985 binary features ( and ). The experimental results of precision, recall and f-scores are shown in the table below.
Cohn results.jpg

Although the modeling of the problem is neat, the results reported were not at par with the best systems that competed in the CoNLL shared task. Marquez et. al. in their paper showed that modeling the SRL problem as a sequential BIO-tagging problem still gives far better results. They made use of a combination of deep and shallow syntactic features and used boosting technique for the BIO-tagging.