Difference between revisions of "Semantic Role Labeling with CRFs"
(22 intermediate revisions by 2 users not shown) | |||
Line 7: | Line 7: | ||
== Introduction == | == Introduction == | ||
− | This [[Category::paper]] aims at [[ | + | This [[Category::paper]] aims at [[AddressesProblem::Semantic Role Labeling]] or SRL of sentences using [[UsesMethod::Conditional Random Fields]]. This was the first attempt of solving the problem of SRL using CRF. The authors defined CRF over the tree structure of the syntactic parse tree of the sentence, rather than defining it on the linear sentence structure as is usually done for the tasks of Named Entity Recognition or Part-of-Speech tagging. The motivation behind this came from the very nature of semantic role labeling which is the task of labeling phrases with their semantic labels with respect to a particular constituent of the sentence, the predicate or the verb. The authors conjectured that for this reason, modeling linear chain CRF was not intuitive for SRL. |
The problem of SRL is usually broken into two parts: identifying candidate phrases for assigning semantic roles, and predicting the semantic role to be assigned to the identified phrase. The approach in this paper does both these things in a single pass over the syntactic tree structure. | The problem of SRL is usually broken into two parts: identifying candidate phrases for assigning semantic roles, and predicting the semantic role to be assigned to the identified phrase. The approach in this paper does both these things in a single pass over the syntactic tree structure. | ||
==Dataset Used== | ==Dataset Used== | ||
− | The dataset used was the | + | The dataset used was the [[UsesDataset::PropBank]] corpus, which is the Penn Treebank corpus with semantic role annotation. |
==CRF Model== | ==CRF Model== | ||
Line 17: | Line 17: | ||
[[File:crf_coh.jpg]] | [[File:crf_coh.jpg]] | ||
− | where <math>C</math> is the set of cliques in the observation tree, <math>\lambda{_k}</math> are model's parameters, and <math>f</math> is the function that maps label for a clique to a vector of scalar values. | + | where <math>C</math> is the set of cliques in the observation tree, <math>\lambda{_k}</math> are model's parameters, and <math>f</math> is the function that maps label for a clique to a vector of scalar values.<br> |
− | The cliques considered were single-node (just one node in the syntactic tree), and two-node (parent and child nodes) ones. The CRF model can thus be restated as | + | The cliques considered were single-node (just one node in the syntactic tree), and two-node (parent and child nodes) ones. The CRF model can thus be restated as<br> |
[[File:crf_coh_alt.jpg]] | [[File:crf_coh_alt.jpg]] | ||
+ | |||
+ | where the actual feature function <math>f</math> is divided into single-node feature function <math>g</math>, and two-node feature function <math>h</math>. | ||
==Features Used== | ==Features Used== | ||
+ | As the cliques considered are single-node and two-node cliques, the features were also defined for both single nodes and parent-child pairs. There were many syntactic features used; I will not be describing each of them as the reference for them can be found in the paper. The syntactic features or the feature types were made into binary functions <math>g</math> and <math>h</math> by combining (feature type, feature value) pairs with label (for a single node) or label pairs (for two-noded cliques), when such a feature-type, feature-value was seen at least once in the training data.<br> | ||
+ | The different feature types used were:<br> | ||
+ | <b>Basic features</b>: {Head word, head PoS, phrase syntactic category, phrase path, position relative to the predicate, surface distance to the predicate, predicate lemma, predicate token, predicate voice, predicate sub-categorisation, syntactic frame}.<br> | ||
+ | <b>Context features</b>: {Head word of first NP in preposition phrase, left and right sibling head words and syntactic categories, first and last word in phrase yield and their PoS, parent syntactic category and head word}.<br> | ||
+ | <b>Common ancestor of the verb</b>: The syntactic category of the deepest shared ancestor of both the verb and node.<br> | ||
+ | <b>Feature conjunctions</b>: The following features were conjoined: { predicate lemma + syntactic category, predicate lemma + relative position, syntactic category + first word of the phrase}.<br> | ||
+ | <b>Default feature</b>: This feature is always on, which allows the classifier to model the prior probability distribution over the possible argument labels.<br> | ||
+ | <b>Joint features</b>: These features were only defined over pair-wise cliques: {whether the parent and child head words do not match, parent syntactic category + and child syntactic category, parent relative position + child relative position, parent relative position + child relative position + predicate PoS + predicate lemma}. | ||
+ | |||
+ | ==Experimental Results and Conclusion== | ||
+ | The parsed training data sentences yielded 90,388 predicates and 1,971,985 binary features (<math>g</math> and <math>h</math>). The experimental results of precision, recall and f-scores are shown in the table below.<br> | ||
+ | [[File:cohn_results.jpg]] | ||
+ | <br><br>Although the modeling of the problem is neat, the results reported were not at par with the best systems that competed in the CoNLL shared task. Marquez et. al. in their [[Semantic_Role_Labeling_as_Sequential_Tagging|paper]] showed that modeling the SRL problem as a sequential BIO-tagging problem still gives far better results. They made use of a combination of deep and shallow syntactic features and used boosting technique for the BIO-tagging. | ||
+ | |||
+ | == Comments == | ||
+ | |||
+ | Any ideas why their approach doesn't work as well as BIO tagging? That is an interesting result. --[[User:Brendan|Brendan]] 18:55, 13 October 2011 (UTC) | ||
+ | |||
+ | ---- | ||
+ | '''Response to the Comment''' (by [[User:manajs|Manaj]]) | ||
− | + | Well, I guess this might have to do something with the kind of problem and the approach taken. Modeling a sequential labeling problem (such as SRL) with CRF should give good results when modeled over sequential structures. However, here CRF is modeled over syntactic tree structures. The authors thought that it would make sense since the arguments in SRL are always relative to a predicate (verb), and the features generally used are the syntactic features. However, it turned out that the results were not as great as many other techniques applied, including SVM (see[http://www.cemantix.org/papers/pradhan-hlt-2004-a.pdf this] or [http://www.lsi.upc.edu/~srlconll/st05/papers/intro.pdf this]). This brings me to thinking that CRF over tree structures might not be a good representation of the problem itself. Coming to its comparison with the sequential BIO-tagging in [[Semantic Role Labeling as Sequential Tagging]], the most likely reasons why the latter outperformed the former significantly could be because of the use of a combination of features (syntactic features and chunk-features) and the usage of Ada-boost with decision trees. Its worth mentioning how [http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf this] empirical study proved Decision Trees to be among the better performing models. | |
− | [[ |
Latest revision as of 21:11, 15 October 2011
Contents
Citation
Trevor Cohn, Philip Blunsom, "Semantic Role Labeling with Conditional Random Fields", CoNLL 2005
Online version
Introduction
This paper aims at Semantic Role Labeling or SRL of sentences using Conditional Random Fields. This was the first attempt of solving the problem of SRL using CRF. The authors defined CRF over the tree structure of the syntactic parse tree of the sentence, rather than defining it on the linear sentence structure as is usually done for the tasks of Named Entity Recognition or Part-of-Speech tagging. The motivation behind this came from the very nature of semantic role labeling which is the task of labeling phrases with their semantic labels with respect to a particular constituent of the sentence, the predicate or the verb. The authors conjectured that for this reason, modeling linear chain CRF was not intuitive for SRL. The problem of SRL is usually broken into two parts: identifying candidate phrases for assigning semantic roles, and predicting the semantic role to be assigned to the identified phrase. The approach in this paper does both these things in a single pass over the syntactic tree structure.
Dataset Used
The dataset used was the PropBank corpus, which is the Penn Treebank corpus with semantic role annotation.
CRF Model
The CRF was defined over the tree structure of the sentence as:
where is the set of cliques in the observation tree, are model's parameters, and is the function that maps label for a clique to a vector of scalar values.
The cliques considered were single-node (just one node in the syntactic tree), and two-node (parent and child nodes) ones. The CRF model can thus be restated as
where the actual feature function is divided into single-node feature function , and two-node feature function .
Features Used
As the cliques considered are single-node and two-node cliques, the features were also defined for both single nodes and parent-child pairs. There were many syntactic features used; I will not be describing each of them as the reference for them can be found in the paper. The syntactic features or the feature types were made into binary functions and by combining (feature type, feature value) pairs with label (for a single node) or label pairs (for two-noded cliques), when such a feature-type, feature-value was seen at least once in the training data.
The different feature types used were:
Basic features: {Head word, head PoS, phrase syntactic category, phrase path, position relative to the predicate, surface distance to the predicate, predicate lemma, predicate token, predicate voice, predicate sub-categorisation, syntactic frame}.
Context features: {Head word of first NP in preposition phrase, left and right sibling head words and syntactic categories, first and last word in phrase yield and their PoS, parent syntactic category and head word}.
Common ancestor of the verb: The syntactic category of the deepest shared ancestor of both the verb and node.
Feature conjunctions: The following features were conjoined: { predicate lemma + syntactic category, predicate lemma + relative position, syntactic category + first word of the phrase}.
Default feature: This feature is always on, which allows the classifier to model the prior probability distribution over the possible argument labels.
Joint features: These features were only defined over pair-wise cliques: {whether the parent and child head words do not match, parent syntactic category + and child syntactic category, parent relative position + child relative position, parent relative position + child relative position + predicate PoS + predicate lemma}.
Experimental Results and Conclusion
The parsed training data sentences yielded 90,388 predicates and 1,971,985 binary features ( and ). The experimental results of precision, recall and f-scores are shown in the table below.
Although the modeling of the problem is neat, the results reported were not at par with the best systems that competed in the CoNLL shared task. Marquez et. al. in their paper showed that modeling the SRL problem as a sequential BIO-tagging problem still gives far better results. They made use of a combination of deep and shallow syntactic features and used boosting technique for the BIO-tagging.
Comments
Any ideas why their approach doesn't work as well as BIO tagging? That is an interesting result. --Brendan 18:55, 13 October 2011 (UTC)
Response to the Comment (by Manaj)
Well, I guess this might have to do something with the kind of problem and the approach taken. Modeling a sequential labeling problem (such as SRL) with CRF should give good results when modeled over sequential structures. However, here CRF is modeled over syntactic tree structures. The authors thought that it would make sense since the arguments in SRL are always relative to a predicate (verb), and the features generally used are the syntactic features. However, it turned out that the results were not as great as many other techniques applied, including SVM (seethis or this). This brings me to thinking that CRF over tree structures might not be a good representation of the problem itself. Coming to its comparison with the sequential BIO-tagging in Semantic Role Labeling as Sequential Tagging, the most likely reasons why the latter outperformed the former significantly could be because of the use of a combination of features (syntactic features and chunk-features) and the usage of Ada-boost with decision trees. Its worth mentioning how this empirical study proved Decision Trees to be among the better performing models.