Difference between revisions of "Finkel and Manning, EMNLP 2009. Nested Named Entity Recognition"

From Cohen Courses
Jump to navigationJump to search
m
m
Line 21: Line 21:
 
The trees are first annotated and binarized (in a right branching manner) with parent and grandparent labels. After which, they train a discriminative constituency parser based on [[RelatedPaper: Finkel et al, 2008]].
 
The trees are first annotated and binarized (in a right branching manner) with parent and grandparent labels. After which, they train a discriminative constituency parser based on [[RelatedPaper: Finkel et al, 2008]].
  
The POS tags are jointly modeled with the named entities. Possible POS tags for each words are based on their distributional similarity. Words in the same clusters are allowed to have any of the same POS tags as in other words in the clusters. Due to the annotation of parent and grandparent labels on POS tags, words are limited to the kind of entities they can be. For instance, verbs would not be labeled with any entities.
+
The POS tags are jointly modeled with the named entities. Possible [[AddressesProblem:POS]] tags for each words are based on their distributional similarity. Words in the same clusters are allowed to have any of the same POS tags as in other words in the clusters. Due to the annotation of parent and grandparent labels on POS tags, words are limited to the kind of entities they can be. For instance, verbs would not be labeled with any entities.
  
 
=== Discriminative parser ===
 
=== Discriminative parser ===
  
The parser used here is a discriminatively trained, conditional random field based CRF-CFG parser of [[RelatedPaper: Finkel et al (2008)]]. It is similar to a standard chart-based PCFG parser except that clique potentials are used instead of probabilities over spans.
+
The parser used here is a discriminatively trained, conditional random field based CRF-CFG parser of [[RelatedPaper: Finkel et al (2008)]]. It is similar to a standard chart-based [[PCFG]] parser except that clique potentials are used instead of probabilities over spans.
  
 
=== Features ===
 
=== Features ===
  
 
+
Due to the nested nature of the model, they were able to use features found in standard [[CRF]]-based [[NER]] systems and also features that are not possible with a [[CRF]]. Each word is labeled with its cluster from the above distributional similarity clustering. Local named entity features are for each entities a word is possibly part of. Similarly, pairs of adjacent tokens are tagged with pairwise named entity features if they are siblings in a subtree. Features are also used for cases where entities are embedded in one another.
  
 
== Experimental Result ==
 
== Experimental Result ==

Revision as of 20:13, 24 September 2011

Nested Named Entity Recognition, by J. R Finkel, C. D Manning. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009.

This Paper is available online [1].

Under construction 09/24

Summary

This paper focuses on a variant of the Named Entity Recognition problem. They present a method for identifying nested named entities using a discriminative constituency parser.

Nested ne.png

An example of a nested named entity in the first 3 tokens of the example sentence, which standard "flat" NER systems are unable to distinguish.

Brief description of the method

The authors model each sentence as a constituent tree. Each named entity would correspond to a phrase in the tree (i.e a subtree). A root node would connect the entire sentence. In addition, the POS tags of non-entities are also modeled. The diagram above is one such example of a "named entity tree".

Annotated.png

The trees are first annotated and binarized (in a right branching manner) with parent and grandparent labels. After which, they train a discriminative constituency parser based on RelatedPaper: Finkel et al, 2008.

The POS tags are jointly modeled with the named entities. Possible AddressesProblem:POS tags for each words are based on their distributional similarity. Words in the same clusters are allowed to have any of the same POS tags as in other words in the clusters. Due to the annotation of parent and grandparent labels on POS tags, words are limited to the kind of entities they can be. For instance, verbs would not be labeled with any entities.

Discriminative parser

The parser used here is a discriminatively trained, conditional random field based CRF-CFG parser of RelatedPaper: Finkel et al (2008). It is similar to a standard chart-based PCFG parser except that clique potentials are used instead of probabilities over spans.

Features

Due to the nested nature of the model, they were able to use features found in standard CRF-based NER systems and also features that are not possible with a CRF. Each word is labeled with its cluster from the above distributional similarity clustering. Local named entity features are for each entities a word is possibly part of. Similarly, pairs of adjacent tokens are tagged with pairwise named entity features if they are siblings in a subtree. Features are also used for cases where entities are embedded in one another.

Experimental Result

Dataset

The authors have released TwitterNER dataset and source code for the paper. The demo and data are available online at [2].

Related Papers