Finkel and Manning, EMNLP 2009. Nested Named Entity Recognition

From Cohen Courses
Jump to navigationJump to search

Nested Named Entity Recognition, by J. R Finkel, C. D Manning. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009.

This Paper is available online [1].


This paper focuses on a variant of the Named Entity Recognition problem. They present a method for identifying nested named entities using a discriminative constituency parser.

Nested ne.png

An example of a nested named entity in the first 3 tokens of the example sentence, which standard "flat" NER systems are unable to distinguish.

Brief description of the method

The authors model each sentence as a constituent tree. Each named entity would correspond to a phrase in the tree (i.e a subtree). A root node would connect the entire sentence. In addition, the POS tags of non-entities are also modeled. The diagram above is one such example of a "named entity tree".


The trees are first annotated and binarized (in a right branching manner) with parent and grandparent labels. After which, they train a discriminative constituency parser based on Finkel et al. ACL 2008.

The POS tags are jointly modeled with the named entities. Possible POS tags for each words are based on their distributional similarity. Words in the same clusters are allowed to have any of the same POS tags as in other words in the clusters. Due to the annotation of parent and grandparent labels on POS tags, words are limited to the kind of entities they can be. For instance, verbs would not be labeled with any entities.

Discriminative parser

The parser used here is a discriminatively trained, conditional random field based CFG parser of Finkel et al (2008). It is similar to a standard chart-based PCFG parser except that clique potentials are used instead of probabilities over spans.


Due to the nested nature of the model, they were able to use nested features in addition to those found in standard CRF-based NER systems. Each word is labeled with its cluster from the distributional similarity clustering. There are local named entity features are for each entity a word is possibly part of. Similarly, pairs of adjacent tokens are tagged with pairwise named entity features if they are siblings in a subtree. Features are also used for cases where entities are embedded in one another.

Experimental Result

The authors performed experiments on the GENIA Corpus, JNLPBA corpus and Ancora.

GENIA results.png

Their system achieve significant performance gains over similar flat model semi-CRF NER system.

JNLPBA results.png

Ancora results.png

The author's system are generally perform better than flat models when evaluated on all the entities as compared to just on top-level entities. It demonstrates the relevance of modeling named entities hierarchy in an NER system.

Related Papers

CRFs are introduced in Lafferty_2001_Conditional_Random_Fields

Sarawagi and Cohen (2004) introduced a semi-CRF model which could be used to model the features of nested named entities.