Difference between revisions of "Krishnan 2006 an effective two stage model for exploiting non local dependencies in named entity recognition"

From Cohen Courses
Jump to navigationJump to search
Line 9: Line 9:
 
== Summary ==
 
== Summary ==
  
This [[Category::paper]] presents a simple and efficient two-stage approach that captures non-local dependencies in NER. The non-local dependencies that the authors try to handle here are that similar or same tokens(or token sequences) are more likely to have the same label. Directly modeling/capturing such non-local dependencies are difficult because assuming such dependencies make the inference harder. The proposed method does not try to capture non-local dependencies directly but uses a two-stage approach.  
+
This [[Category::paper]] presents a simple and efficient two-stage approach that captures non-local dependencies in [[AddressesProblem::Named Entity Recognition]] (NER). The non-local dependencies that the authors try to handle here are that similar or same tokens(or token sequences) are more likely to have the same label. Directly modeling/capturing such non-local dependencies are difficult because assuming such dependencies make the inference harder. The proposed method does not try to capture non-local dependencies directly but in a two-stage way. In the first stage, a conventional sequence CRF model is used to approximate aggregate statistics of labels. Then another CRF model is used, using a function of those approximated aggregate statistics of labels as its features. For a given token/entity(labeled by the first CRF), it tries to encourage the majority label assigned to (1) the same token, (2) the same entity, and (3) entities whose token sequence includes the current token sequence either (a) in the same document or (b) in the corpus. This method, when tested against previous models that tried to capture non-local dependencies directly, achieved a higher relative error reduction.
 
 
In the first stage, a conventional sequence CRF model is used to approximate aggregate statistics of labels. Then another CRF model is used, using a function of those approximated aggregate statistics of labels as its features. For a given token/entity(labeled by the first CRF), it tries to encourage the majority label assigned to (1) the same token, (2) the same entity, and (3) entities whose token sequence includes the current token sequence either (a) in the same document or (b) in the corpus.  
 
 
 
  
  
Line 23: Line 20:
 
1. Token-majority features: these features refer to the majority label assigned to the particular token in the document/corpus. These capture dependencies between similar token sequences
 
1. Token-majority features: these features refer to the majority label assigned to the particular token in the document/corpus. These capture dependencies between similar token sequences
  
2. Entity-majority features: these features refer to the majority label assigned to the particular entity (labeled by the first CRF) in the document/corpus. If the token was labeled not as a named entity the feature returns the majority label assigned to a  
+
2. Entity-majority features: these features refer to the majority label assigned to the particular entity (labeled by the first CRF) in the document/corpus. If the token was labeled not as a named entity the feature returns the majority label assigned to the majority label assigned to a single-token named entity with the current token.
  
 
== Experimental Result ==
 
== Experimental Result ==

Revision as of 08:59, 29 September 2011

Citation

Krishnan, V. and Manning, C. D. 2006. An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition. In ACL-COLING’06: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics.

Online version

ACM Digital Library

Summary

This paper presents a simple and efficient two-stage approach that captures non-local dependencies in Named Entity Recognition (NER). The non-local dependencies that the authors try to handle here are that similar or same tokens(or token sequences) are more likely to have the same label. Directly modeling/capturing such non-local dependencies are difficult because assuming such dependencies make the inference harder. The proposed method does not try to capture non-local dependencies directly but in a two-stage way. In the first stage, a conventional sequence CRF model is used to approximate aggregate statistics of labels. Then another CRF model is used, using a function of those approximated aggregate statistics of labels as its features. For a given token/entity(labeled by the first CRF), it tries to encourage the majority label assigned to (1) the same token, (2) the same entity, and (3) entities whose token sequence includes the current token sequence either (a) in the same document or (b) in the corpus. This method, when tested against previous models that tried to capture non-local dependencies directly, achieved a higher relative error reduction.


Brief description of the method

There are two Conditional Random Fields in this method. The first is a conventional linear CRF. The local dependencies used here are tokens before and after. The authors use features known to be effective in NER, such as the current, previous, and next words, character n-grams of the current word, etc. For the full list of features, check the appendix.

The second CRF uses features that is a function of an aggregate statistics of labels obtained from the first CRF. Specifically, they are the following 3 types:

1. Token-majority features: these features refer to the majority label assigned to the particular token in the document/corpus. These capture dependencies between similar token sequences

2. Entity-majority features: these features refer to the majority label assigned to the particular entity (labeled by the first CRF) in the document/corpus. If the token was labeled not as a named entity the feature returns the majority label assigned to the majority label assigned to a single-token named entity with the current token.

Experimental Result

The authors tested this method on CoNLL'03 English named entity recognition dataset. Their baseline Conditional Random Fields achieved an already competitive result of F-measure of 85.29. Adding document level non-local dependencies achieved 12.6% relative error reduction over the baseline. Incorporating non-local dependencies across documents (at corpus level) as well achieved 13.3% relative error reduction. Also, despite the high baseline performance compared to other methods from Bunescu and Mooney, ACL 2004 and Finkel et al, ACL 2005, the proposed method managed to get higher relative error reduction. For more detailed result, check the table below.

Krishnan et al ACL 2006.png

Related papers

[1] Bunescu and Mooney, ACL 2004

[2] Finkel et al, ACL 2005

Appendix

Full list of features for the baseline CRF: the current, previous and next words, character n-grams of the current word, Part of Speech tag of the current word and surrounding words, the shallow parse chunk of the current word, shape of the current word, the surrounding word shape sequence, the presence of a word in a left window of size 5 around the current word and the presence of a word in a right window of size 5 around the current word.