Klein et al, CONLL 2003

From Cohen Courses
Revision as of 23:09, 30 November 2010 by PastStudents (talk | contribs)
Jump to navigationJump to search

Citation

Dan Klein, Joseph Smarr, Huy Nguyen and Christopher D. Manning. 2003. Named Entity Recognition with Character-Level Model. In Proceedings of CoNLL-2003.

Online version

ACL Anthology

Summary

In this paper, the authors propose using character representations instead of word representations in the Named Entity Recognition task. The first model proposed is the character-level HMM with minimal context information and the second model is maximum-entropy conditional markov model with rich context features.

In character-level HMM, each character is represented with one state which depends only on the previous state. And each character observation depends on the current state and on the previous n-1 observations. In order to prevent characters of a word getting different state labels, they represent each state with a pair(t,k) where t is entity type and k is length of time of being in that state. They limit the use of k to n-gram history and represent the final state with F.

UsesDataset


A previous paper that uses character-level approach was the Cucerzan and Yarowsky, SIGDAT 1999. In that paper the authors used the prefix and suffix tries but in this paper all the characters are used.