Finkel and Manning, EMNLP 2009. Nested Named Entity Recognition

From Cohen Courses
Revision as of 20:08, 24 September 2011 by Ysim (talk | contribs)
Jump to navigationJump to search

Nested Named Entity Recognition, by J. R Finkel, C. D Manning. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009.

This Paper is available online [1].

Under construction 09/24

Summary

This paper focuses on a variant of the Named Entity Recognition problem. They present a method for identifying nested named entities using a discriminative constituency parser.

Nested ne.png

An example of a nested named entity in the first 3 tokens of the example sentence, which standard "flat" NER systems are unable to distinguish.

Brief description of the method

The authors model each sentence as a constituent tree. Each named entity would correspond to a phrase in the tree (i.e a subtree). A root node would connect the entire sentence. In addition, the POS tags of non-entities are also modeled. The diagram above is one such example of a "named entity tree".

Annotated.png

The trees are first annotated and binarized (in a right branching manner) with parent and grandparent labels. After which, they train a discriminative constituency parser based on RelatedPaper: Finkel et al, 2008.

The POS tags are jointly modeled with the named entities. Possible POS tags for each words are based on their distributional similarity. Words in the same clusters are allowed to have any of the same POS tags as in other words in the clusters. Due to the annotation of parent and grandparent labels on POS tags, words are limited to the kind of entities they can be. For instance, verbs would not be labeled with any entities.

Discriminative parser

The parser used here is a discriminatively trained, conditional random field based CRF-CFG parser of RelatedPaper: Finkel et al (2008). It is similar to a standard chart-based PCFG parser except that clique potentials are used instead of probabilities over spans.

Features

Experimental Result

Dataset

The authors have released TwitterNER dataset and source code for the paper. The demo and data are available online at [2].

Related Papers