Sgardine writesup Krishnan 2006
This is a review of krishnan_2006_an_effective_two_stage_model_for_exploiting_non_local_dependencies_in_named_entity_recognition by user:Sgardine.
Summary
NER models often make mistakes by tagging the same entity inconsistently. Some have approached the problem by explicitly modelling long-range dependencies, forcing the replacement of exact inference with approximation, degrading runtime and accuracy. Here the authors propose to use a two-stage model whereby the first model is a standard CRF, and the second model is the same but augmented with additional features derived from the labels of the first. The second CRF is thereby able to find weights for similar entity's consistencies at the document- and corpus-level. The system was evaluated on NER over the CoNLL 2003 dataset. The addition of document-level consistency features reduced error by 12.6%, and an additional percent or so was gained by adding corpus-level consistency features.
Commentary
I liked the discussion of how often inconsistencies occur in the data.
In a sense the first CRF just provides richer features for the second, where those features are noisy indications of the entity's distributions of labelings in the data. As an upper bound it would be interesting to see what accuracy would be achieved by a CRF with access to the true labels of the entity (i.e. how helpful could these features possibly be in an unrealizable absence of noise); this might indicate how hard we should work on improving the accuracy of the first model.