Smith and Osborne CoNLL 2006

From Cohen Courses
Revision as of 04:21, 1 December 2010 by PastStudents (talk | contribs) (Created page with '== Citation == Smith, A. and Osborne, M. "Using Gazetteers in Discriminative Information Extraction." Computational Natural Language Learning (CoNLL-X), 2006. == Online Versio…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

Smith, A. and Osborne, M. "Using Gazetteers in Discriminative Information Extraction." Computational Natural Language Learning (CoNLL-X), 2006.

Online Version

[1]


Summary

This paper extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.

Similar to CRFs, a semi-CRF applies one exponential model over the whole sequence. However, instead of modeling a sequence of words, we model a sequence of segments, which each are multiple words belonging to the same state. This expands the space to be explored, so that when performing inference, the Viterbi-like recursion algorithm must also maximize over the segment boundaries. The consequence of this is relatively minor, with inference still taking polynomial time. This cost is less than higher order CRFs, which consider all combinations of the L previous states, whereas semi-CRFs only consider where the L previous states are the same. Training the model is not much harder either. The likelihood is still convex and a recursion step will yield the normalizer.

The method was then tested on various datasets for NER tasks and compared to standard CRFs. The key ingredient was the choice of richer features in the semi-CRF models. These segment-level features included the number of capital letters in a segment, the segment lengths, and dictionaries that allowed for non-exact matchings. Segment lengths, particularly, can be modeled as any distribution (such as Guassian or exponential) depending upon how this feature is defined, which is a commonly touted benefit of semi-HMMs over regular HMMs. The results indicate that the semi-CRFs outperformed the regular CRFs in almost all cases, sometimes by quite large margins.

Related Papers

Skounakis, IJCAI 2003 applies hierarchical HMMs to IE, which model segments like semi-CRFs, but where the segments are themselves Markov processes.

Okanoharu, ACL 2006 improve the speed of semi-CRFs when the entities are very long by using a filtering process and a feature forest model.

Andrew, ENMLP 2006 combine semi-CRFs with traditional CRFs in order to use segment and word level features. Some word level features are not well represented in the semi-CRF model. He demonstrates improved performance on the task of Chinese word segmentation.