Cucerzan and Yarowsky, SIGDAT 1999
Citation
Cucerzan, S. and Yarowsky, D. 1999. Language independent named entity recognition combining morphological and contextual evidence. In In Proceedings of the Joint SIGDAT Conference on EMNLP and VLC (1999), pp. 90-99..
Online version
Summary
This paper describes a language independent EM-style bootstrapping algorithm to produce a name entity recognizer. Since some morphological information and contextual patterns are good indicators for certain name entity classes, the bootstrapping algorithm iteratively learns from word internal and contextual information of entities.
The authors experimented with five languages; English, Romanian, Greek, Turkish and Hindi. For each entity class, the authors provide short list of seeds. they also used some basic particularities of the language like capitalization, word separators and language related exceptions. . Therefore for each entity bootstrapping makes use of the word internal and contextual information.