Difference between revisions of "Klein et al, CONLL 2003"
PastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
Line 10: | Line 10: | ||
In this [[Category::paper]], the authors propose using character representations instead of word representations in the [[AddressesProblem::Named Entity Recognition]] task. The first model proposed is the character-level HMM with minimal context information and the second model is maximum-entropy conditional markov model with rich context features. | In this [[Category::paper]], the authors propose using character representations instead of word representations in the [[AddressesProblem::Named Entity Recognition]] task. The first model proposed is the character-level HMM with minimal context information and the second model is maximum-entropy conditional markov model with rich context features. | ||
− | In character-level [[UsesMethod::HMM]], each character is represented with one state which depends only on the previous state. And each character observation depends on the current state and on the previous n-1 observations. | + | In character-level [[UsesMethod::HMM]], each character is represented with one state which depends only on the previous state. And each character observation depends on the current state and on the previous n-1 observations. In order to prevent characters of a word getting different state labels, they represent each state with a pair(t,k) where t is entity type and k is length of time of being in that state. They limit the use of k to n-gram history and represent the final state with F. |
− | + | UsesDataset | |
A previous paper that uses character-level approach was the [[RelatedPaper::Cucerzan and Yarowsky, SIGDAT 1999]]. In that paper the authors used the prefix and suffix tries but in this paper all the characters are used. | A previous paper that uses character-level approach was the [[RelatedPaper::Cucerzan and Yarowsky, SIGDAT 1999]]. In that paper the authors used the prefix and suffix tries but in this paper all the characters are used. |
Revision as of 22:09, 30 November 2010
Citation
Dan Klein, Joseph Smarr, Huy Nguyen and Christopher D. Manning. 2003. Named Entity Recognition with Character-Level Model. In Proceedings of CoNLL-2003.
Online version
Summary
In this paper, the authors propose using character representations instead of word representations in the Named Entity Recognition task. The first model proposed is the character-level HMM with minimal context information and the second model is maximum-entropy conditional markov model with rich context features.
In character-level HMM, each character is represented with one state which depends only on the previous state. And each character observation depends on the current state and on the previous n-1 observations. In order to prevent characters of a word getting different state labels, they represent each state with a pair(t,k) where t is entity type and k is length of time of being in that state. They limit the use of k to n-gram history and represent the final state with F.
UsesDataset
A previous paper that uses character-level approach was the Cucerzan and Yarowsky, SIGDAT 1999. In that paper the authors used the prefix and suffix tries but in this paper all the characters are used.