Nschneid writeup of Krishnan 2006

From Cohen Courses
Jump to navigationJump to search

This is Nschneid's review of Krishnan_2006_an_effective_two_stage_model_for_exploiting_non_local_dependencies_in_named_entity_recognition

We present a simple two-stage approach where our second CRF uses features derived from the output of the first CRF. This gives us the advantage of defining a rich set of features to model non-local dependencies, and also eliminates the need to do approximate inference, since we do not explicitly capture the non-local dependencies in a single model, like the more complex existing approaches. This also enables us to do inference efficiently since our inference time is merely the inference time of two sequential CRF’s; in contrast Finkel et al. (2005) reported an increase in running time by a factor of 30 over the sequential CRF, with their Gibbs sampling approximate inference.
In all, our approach is simpler, yields higher F1 scores, and is also much more computationally efficient than existing approaches modeling non-local dependencies.

Similar to the skip chain CRF model for NER, except nonlocal agreement between tags for the same word is encouraged by way of a higher-order (stacked?) CRF model, which is more efficient and more accurate. The nonlocal features break down into token-majority features, entity-majority features (an entity sequence is a token sequence judged to be a single entity from the first-stage CRF), superentity-majority features (for the span of words in an entity sequence, there may be superentity sequences, i.e. larger spans containing it as a subspan). Each of these is instantiated both for the document level and the corpus level.

  • How is this different from stacking (minus the cross-validation approach)?