Mnduong writeup of Krishnan & Manning 2006

This paper introduces another approach to model long range dependencies in named-entity recognition. It is motivated by the fact that different occurrences of the same token or similar tokens in the same document are likely to have the same label. Moreover, it aims at a more efficient method than existing ones, which not only are much slower than linear-chain CRFs, but also depend on approximate inference.
The method is a two-stage one. The first stage is a regular CRF run over the input, with normal features. The second stage uses another CRF with additional features coming from the output of the first stage. In particular, it uses global information of the first CRF's classification across similar tokens, entities and superentities.
The method achieves better F1 scores than both a baseline linear-chain CRFs and two other methods that exploit non-local information. It also runs faster than the other non-local methods, because it simply requires another run of CRF without introducing additional edges to the graph.
Overall, this is a simple and intuitive method, which has attractive performance results. I would like to see comparisons of this method against skip-chain CRFs. Another minor point is that the baseline linear-chain CRFs that they implemented had performance that was only "close to the best published local CRF models", not exactly as good. It would be helpful to know how much this difference is, and which paper gave the highest performance CRFs.

Navigation menu