Benajiba and Rosso, LREC 2008

From Cohen Courses
Revision as of 07:15, 28 November 2010 by PastStudents (talk | contribs)
Jump to navigationJump to search


Yassine Benajiba and Paolo Rosso. 2008. Arabic Named Entity Recognition using Conditional Random Fields. In Proc. of Workshop on HLT&NLP within the Arabic World, LREC'08.

Online version

LREC 2008


This paper describes a Conditional Random Fields approach to the Arabic Named Entity Recognition problem. Arabic is a highly inflectional language in which words can take both prefixes and suffixes. In addition to the complex morphology of Arabic, there is also the absence of capital letters which is a significant feature for NER.

In this paper the authors used Conditional Random Fields. In order to resolve the data sparsity problem they performed word segmentation which is to separate the different components of a word with a space character. For the experiments they used the ANERcorp. Four types of named entities (person, location, organization and miscellaneous) were searched.

Previous to this paper, the authors were using Maximum Entropy model with binary features which uses the word itself, the preceding word, the bigrams around the word and external resources. Furthermore in order to ease the difficulty of detecting the named entities, they used a 2-step approach where the first steps focused on detecting the entities and the second step classifies them.