Rbosaghz writeup of Borthwick et. al.

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Borthwick_1998_exploiting_diverse_knowledge_sources_via_maximum_entropy_in_named_entity_recognition by user:Rbosaghz.

This paper takes a maxent approach to tackle the named entity recognition task. They focus on the MUC-7 dataset and show that their purely statistically-learned system can beat state-of-the-art systems when coupled with some hard-coded rules. Their features involve looking at capitalization, some lexical indicators such as "Mr", and section indicators such as "Preamble", and Dictionary features. For their final system they also added features which used hard-coded rules. They investigate how to deal with compound words such as "New York" which should be one entity. Finally, they decode using the viterbi algorithm. They also describe NYU's submission to a contest.

This paper was not particularly exciting for me, but was likely very nice in 1998.