Carreras et al, CoNLL 2003

From Cohen Courses
Jump to navigationJump to search


Xavier Carreras and Llu´ıs Marquez ` and Llu´ıs Padro´. 2003. A simple named entity extractor using AdaBoost. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03), Vol. 4. Association for Computational Linguistics, Stroudsburg, PA, USA, 152-155.

Online version



In this paper author proposed a simple AdaBoost based approach for Named Entity Recognition. The approach takes two sub-step to solve this problem. First is recognition in which three binary classifiers used to label as one of B, I or O. Second NE classification is done by using multclass learning.

Brief description of the method

The method used for this task was from the context of the current word. Each word in neighbourhood is coded as feature along with the relative position. Different kind of features user were: Lexical, Syntactic, Orthographic, Affixes, Word Type Patterns, Left Prediciotns, Bag-of-Word, Trigger Word and Gazetteer Features.

The recognition task was performed by three independent binary one-vs-all classifiers. Out of these three classifier having maximum confidence was used.

In Named Entity Classification module, multiclass multilabel AdaBoost.MH algorithm was used. The algorithm was performed with different parameters like three-class classification and four-class classification. Later performed the good.

Experimental Result

Results for recognition task, results were better for English than German. On English it has approximately 95% of precision and recall.

In classification task, for English they achieved 95.14% accuracy while 85.12% for German.