Difference between revisions of "Talukdar et al CoNLL 2006"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Talukdar, T., Brants, T., Liberman, M. and Pereira, F. "A Context Pattern Induction Method for Named Entity Extraction." Computational Natural Language Learning …')
 
Line 8: Line 8:
  
 
== Summary ==  
 
== Summary ==  
 +
 +
This [[Category::paper]] extends previous methods for [[UsesMethod::pattern induction]] and uses the patterns to find new instances of interest, which then assist in [[AddressesProblem::named entity recognition]]. This is a form of [[UsesMethod::semi-supervised learning]], using unlabeled data to derive new features. The method is language independent, focusing on word and transition frequencies rather than chunking or parsing information.
 +
 +
The method starts with seed instances, using them to find contexts frequently associated with the seeds. Rather than use the contexts directly, it then finds trigger words in the contexts that are rare in the corpus yet frequently found in the contexts by using IDF. These dominating words are used to define patterns later. Simply using IDF without accounting for the frequency of the word in --relevant-- contexts would lead to lower precision.
 +
 +
The dominating words denote the start of phrases surrounding the entity of interest. These phrases are used to induce finite state automata in an effort to generalize from the phrases. The FSMs are pruned to remove transitions which have few paths using them (as opposed to which have a low weight locally on the transition).
 +
 +
The resulting patterns from the FSMs are used to find new instances of entities to populate lists. During this process, the patterns are further filtered to encourage higher precision at the cost of recall. High quality entities from high quality patterns are added to the seed lists and the procedure then starts over.
 +
 +
The induced lists were used as features to improve the performance of [[UsesMethod::CRF]] based entity taggers. The authors showed that inducing lists from extra unlabeled data improved generalization performance of the taggers. When lists were taken only from training data, there was a strong tendency to overfit.
 +
 +
== Related Papers ==
 +
 +
[[RelatedPaper::Riloff and Jones, NCAI 1999]] and [[RelatedPaper::Etzioni, AIJ 2005]] use pattern induction with noun phrases, which are more language dependent than this method.
 +
 +
[[RelatedPaper::Agichtein and Gravano, ICDL 2000]] induce patterns but apply this to tasks of relation extraction.
 +
 +
[[RelatedPaper::Wang and Cohen, ICDM 2007]] introduce a method for set-expansion which is also language independent, relying on lists in the pages it is extracting from.
 +
 +
 +
 +
 +
 +
 +
 +
  
 
This [[Category::paper]] extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.
 
This [[Category::paper]] extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.

Revision as of 08:29, 1 December 2010

Citation

Talukdar, T., Brants, T., Liberman, M. and Pereira, F. "A Context Pattern Induction Method for Named Entity Extraction." Computational Natural Language Learning (CoNLL-X), 2006.

Online Version

[1]

Summary

This paper extends previous methods for pattern induction and uses the patterns to find new instances of interest, which then assist in named entity recognition. This is a form of semi-supervised learning, using unlabeled data to derive new features. The method is language independent, focusing on word and transition frequencies rather than chunking or parsing information.

The method starts with seed instances, using them to find contexts frequently associated with the seeds. Rather than use the contexts directly, it then finds trigger words in the contexts that are rare in the corpus yet frequently found in the contexts by using IDF. These dominating words are used to define patterns later. Simply using IDF without accounting for the frequency of the word in --relevant-- contexts would lead to lower precision.

The dominating words denote the start of phrases surrounding the entity of interest. These phrases are used to induce finite state automata in an effort to generalize from the phrases. The FSMs are pruned to remove transitions which have few paths using them (as opposed to which have a low weight locally on the transition).

The resulting patterns from the FSMs are used to find new instances of entities to populate lists. During this process, the patterns are further filtered to encourage higher precision at the cost of recall. High quality entities from high quality patterns are added to the seed lists and the procedure then starts over.

The induced lists were used as features to improve the performance of CRF based entity taggers. The authors showed that inducing lists from extra unlabeled data improved generalization performance of the taggers. When lists were taken only from training data, there was a strong tendency to overfit.

Related Papers

Riloff and Jones, NCAI 1999 and Etzioni, AIJ 2005 use pattern induction with noun phrases, which are more language dependent than this method.

Agichtein and Gravano, ICDL 2000 induce patterns but apply this to tasks of relation extraction.

Wang and Cohen, ICDM 2007 introduce a method for set-expansion which is also language independent, relying on lists in the pages it is extracting from.





This paper extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.

Similar to CRFs, a semi-CRF applies one exponential model over the whole sequence. However, instead of modeling a sequence of words, we model a sequence of segments, which each are multiple words belonging to the same state. This expands the space to be explored, so that when performing inference, the Viterbi-like recursion algorithm must also maximize over the segment boundaries. The consequence of this is relatively minor, with inference still taking polynomial time. This cost is less than higher order CRFs, which consider all combinations of the L previous states, whereas semi-CRFs only consider where the L previous states are the same. Training the model is not much harder either. The likelihood is still convex and a recursion step will yield the normalizer.

The method was then tested on various datasets for NER tasks and compared to standard CRFs. The key ingredient was the choice of richer features in the semi-CRF models. These segment-level features included the number of capital letters in a segment, the segment lengths, and dictionaries that allowed for non-exact matchings. Segment lengths, particularly, can be modeled as any distribution (such as Guassian or exponential) depending upon how this feature is defined, which is a commonly touted benefit of semi-HMMs over regular HMMs. The results indicate that the semi-CRFs outperformed the regular CRFs in almost all cases, sometimes by quite large margins.

Related Papers

Skounakis, IJCAI 2003 applies hierarchical HMMs to IE, which model segments like semi-CRFs, but where the segments are themselves Markov processes.

Okanoharu, ACL 2006 improve the speed of semi-CRFs when the entities are very long by using a filtering process and a feature forest model.

Andrew, ENMLP 2006 combine semi-CRFs with traditional CRFs in order to use segment and word level features. Some word level features are not well represented in the semi-CRF model. He demonstrates improved performance on the task of Chinese word segmentation.