Cucerzan and Yarowsky, SIGDAT 1999

From Cohen Courses
Jump to navigationJump to search

Citation

Cucerzan, S. and Yarowsky, D. 1999. Language independent named entity recognition combining morphological and contextual evidence. In In Proceedings of the Joint SIGDAT Conference on EMNLP and VLC (1999), pp. 90-99..

Online version

ACL Anthology

Summary

This paper describes a language independent EM-style bootstrapping algorithm to produce a name entity recognizer. Since some morphological information and contextual patterns are good indicators for certain name entity classes, the bootstrapping algorithm iteratively learns from word internal and contextual information of entities.

The authors experimented with five languages; English, Romanian, Greek, Turkish and Hindi. For each entity class, the authors provide short list of seeds. they also used some basic particularities of the language like capitalization, word separators and language related exceptions. . Therefore for each entity bootstrapping makes use of the word internal and contextual information.


Algorithm.png


For all five languages, using context and morphology tries together give better accuracy then using only one of them. Furthermore boosting improves the results for all languages. Experimenting with train size showed that increasing the train size improves the total accuracy due to more accurate classifications. Also an increase in the length of the provided seed list resulted in improved F-score.

Related Papers