Nschneid writeup of Borthwick 1998
This is Nschneid's review of Borthwick_1998_exploiting_diverse_knowledge_sources_via_maximum_entropy_in_named_entity_recognition
A maxent model for NER which first debuted at MUC-7. (This and another MUC-7 system were the first to use maxent for NER; Ratnaparkhi had already done so for POS tagging and parsing.) This work predates CRFs, so no features over tag bigrams. However, they do a sort of stacking, with features that incorporating the predictions of other models. They note that their system can learn the weaknesses of other systems in this way. Unclear whether they exploit this to include pseudo-bigram features (i.e. features pairing a possible tag with another model's prediction for the previous tag). 7 entity types, Viterbi decoding to find a legal BIO-style tag sequence.
It is of historical interest that the authors' note:
- In comparing the maximum entropy and HMM-based approaches to named entity recognition, we are hopeful that M.E. will turn out to be the better method in the end. We think it is possible that some of Identifinder's current advantage can be neutralized by simply adding the just-mentioned features to MENE [our system]. On the other hand, we have a harder time seeing how some of MENE's strengths can be inte- grated into an HMM-based system.
They would only have to wait 3 more years for an answer.
- The class only has to wait a week :-)