Nschneid writeup of Krishnan 2006
This is Nschneid's review of Krishnan_2006_an_effective_two_stage_model_for_exploiting_non_local_dependencies_in_named_entity_recognition
- We present a simple two-stage approach where our second CRF uses features derived from the output of the first CRF. This gives us the advantage of defining a rich set of features to model non-local dependencies, and also eliminates the need to do approximate inference, since we do not explicitly capture the non-local dependencies in a single model, like the more complex existing approaches. This also enables us to do inference efficiently since our inference time is merely the inference time of two sequential CRF’s; in contrast Finkel et al. (2005) reported an increase in running time by a factor of 30 over the sequential CRF, with their Gibbs sampling approximate inference.
- In all, our approach is simpler, yields higher F1 scores, and is also much more computationally efficient than existing approaches modeling non-local dependencies.
Similar to the skip chain CRF model for NER, except nonlocal agreement between tags for the same word is encouraged by way of a higher-order (stacked?) CRF model, which is more efficient and more accurate. The nonlocal features break down into token-majority features, entity-majority features (an entity sequence is a token sequence judged to be a single entity from the first-stage CRF), superentity-majority features (for the span of words in an entity sequence, there may be superentity sequences, i.e. larger spans containing it as a subspan). Each of these is instantiated both for the document level and the corpus level.
- How is this different from stacking (minus the cross-validation approach)?