Rbosaghz review of Krishnan 2006
This is a review of krishnan_2006_an_effective_two_stage_model_for_exploiting_non_local_dependencies_in_named_entity_recognition by user:Rbosaghz.
This paper is about using non-local dependencies for the NER task.
The authors use a CRF with only local features to make predictions in a first stage, then train another CRF which uses both local information and features extracted from the output of the first CRF.
Their first-stage CRF is a sequence model in which labels for tokens directly depend only on the labels corresponding to the previous and next tokens. They use local features that have been shown to be effective in NER, e.g. the current, previous and next words, character n-grams of the current word, Part of Speech tag of the current word and surrounding words, amongst other local features. A CRF trained with these local features give them a reasonable baseline. From there they add global features such as Label Consistency and obtain gains of around 12% relative error. They compare to state-of-the-art models at the time of writing and show they are competitive.
An example of non-local feature they use is Label Consistency, which is motivated by the observation that within a particular document, different occurrences of a particular token sequence (or similar token sequences) are unlikely to have different entity labels.