Krishnan 2006 an effective two stage model for exploiting non local dependencies in named entity recognition
Contents
Citation
Krishnan, V. and Manning, C. D. 2006. An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition. In ACL-COLING’06: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics.
Online version
Summary
This paper presents a simple and efficient two-stage approach that captures non-local dependencies in NER. The non-local dependencies that the authors try to handle here are that similar or same tokens(or token sequences) are more likely to have the same label. Directly modeling/capturing such non-local dependencies are difficult because assuming such dependencies make the inference harder. The proposed method does not try to capture non-local dependencies directly but uses a two-stage approach.
In the first stage, a conventional sequence CRF model is used to approximate aggregate statistics of labels. Then another CRF model is used, using a function of those approximated aggregate statistics of labels as its features. For a given token/entity(labeled by the first CRF), it tries to encourage the majority label assigned to (1) the same token, (2) the same entity, and (3) entities whose token sequence includes the current token sequence either (a) in the same document or (b) in the corpus.
Brief description of the method
There are two Conditional Random Fields in this method. The first is a conventional linear CRF. The local dependencies used here are tokens before and after. The authors use features known to be effective in NER, such as the current, previous, and next words, character n-grams of the current word, etc. For the full list of features, check the appendix.
The second CRF uses features that is a function of an aggregate statistics of labels obtained from the first CRF. Specifically, they are the following 3 types:
1. Token-majority features: these features refer to the majority label assigned to the particular token in the document/corpus. These capture dependencies between similar token sequences
2. Entity-majority features: these features refer to the majority label assigned to the particular entity (labeled by the first CRF) in the document/corpus. If the token was labeled not as a named entity the feature returns the majority label assigned to a
Experimental Result
The authors tested this method on CoNLL'03 English named entity recognition dataset. Their baseline Conditional Random Fields achieved an already competitive result of F-measure of 85.29. Adding document level non-local dependencies achieved 12.6% relative error reduction over the baseline. Incorporating non-local dependencies across documents (at corpus level) as well achieved 13.3% relative error reduction. Also, despite the high baseline performance compared to other methods from Bunescu and Mooney, ACL 2004 and Finkel et al, ACL 2005, the proposed method managed to get higher relative error reduction. For more detailed result, check the table below.
Related papers
[1] Bunescu and Mooney, ACL 2004
Appendix
Full list of features for the baseline CRF: the current, previous and next words, character n-grams of the current word, Part of Speech tag of the current word and surrounding words, the shallow parse chunk of the current word, shape of the current word, the surrounding word shape sequence, the presence of a word in a left window of size 5 around the current word and the presence of a word in a right window of size 5 around the current word.