Bbd writeup of Vijay Krishnan et. al.
This is a review of Krishnan_2006_an_effective_two_stage_model_for_exploiting_non_local_dependencies_in_named_entity_recognition by user:bbd.
This paper suggests an efficient technique to model non-local dependencies. It outperforms best systems that use non-local features for NER using complicated models for approximate inference. The basic technique has 2 parts : 1. Run CRF based NER system with local features to make predictions 2. Train another CRF which takes local, non-local features and output of first CRF. Non-local features can be from same document or from different documents.
I liked this technique because it addresses the label-consistency and subsequence constraints and gives solution with running time equal to time required to run 2 simple CRF models. Also based on predictions of first CRF they define nice feature catagories like : Token-majority features, Entity-majority features and Super-entity majority features. If there is inconsistency between some entity labels like (organization-location inconsistency) then model will automatically weigh those features less and learn right model.