Difference between revisions of "Kucuk and Yazici, FQAS 2009"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Küçük, D. and Yazici, A. 2009. Named Entity Recognition Experiments on Turkish Texts. In Proceedings of the International Conference on Flexible Query Answerin…')
 
 
(2 intermediate revisions by the same user not shown)
Line 8: Line 8:
  
 
== Summary ==
 
== Summary ==
This [[Category::paper]] describes the first rule-based approach to the NER task on Turkish texts.
+
This [[Category::paper]] describes the first rule-based approach to the [[AddressesProblem::Named Entity Recognition]] task on Turkish texts.
 
The authors used external several information sources in the system
 
The authors used external several information sources in the system
 
* Lexical Resources : Dictionaries and list of well known entities
 
* Lexical Resources : Dictionaries and list of well known entities
 
* Pattern Bases : Context patterns to identify entities
 
* Pattern Bases : Context patterns to identify entities
  
The authors experimented on news texts from METU Turkish corpus and some additional sources such as child stories, historical texts and news video transcriptions. These texts were manually annotated by the authors.  
+
The authors experimented on news texts from [[UsesDataset::METU Turkish Corpus]] and some additional sources such as child stories, historical texts and news video transcriptions. These texts were manually annotated by the authors.  
  
 
For news articles they report an F-Measure of 78.7%. After the analysis of the results, the authors reported below cases as the reason of low accuracy.
 
For news articles they report an F-Measure of 78.7%. After the analysis of the results, the authors reported below cases as the reason of low accuracy.
Line 20: Line 20:
 
    
 
    
 
The authors got even lower accuracy results when the system was applied to other domains due to the absence of entities at the lexical resources.
 
The authors got even lower accuracy results when the system was applied to other domains due to the absence of entities at the lexical resources.
 +
 +
== Related Papers ==
 +
There are only two other related papers for NER on Turkish texts. One of them [[RelatedPaper::Cucerzan and Yarowsky, SIGDAT 1999]] used language-independent bootstrapping algorithm and the other one [[RelatedPaper::Tur et al, NLEJ 2003]] used statistical methods.

Latest revision as of 14:23, 22 October 2010

Citation

Küçük, D. and Yazici, A. 2009. Named Entity Recognition Experiments on Turkish Texts. In Proceedings of the International Conference on Flexible Query Answering Systems. Roskilde, Denmark. T. Andreasen et al. (Eds.): FQAS 2009, LNAI 5822, pp. 524-535

Online version

LNAI 5822

Summary

This paper describes the first rule-based approach to the Named Entity Recognition task on Turkish texts. The authors used external several information sources in the system

  • Lexical Resources : Dictionaries and list of well known entities
  • Pattern Bases : Context patterns to identify entities

The authors experimented on news texts from METU Turkish Corpus and some additional sources such as child stories, historical texts and news video transcriptions. These texts were manually annotated by the authors.

For news articles they report an F-Measure of 78.7%. After the analysis of the results, the authors reported below cases as the reason of low accuracy.

  • Precision of person name recognition is low because some common nouns, which may be used as a person name, are extracted as entities. A similar precision problem also occurs with patterns. These two problems can be due to the case of not using capitalization information.
  • They reported problems of recognizing a compound organization entity as two entities.

The authors got even lower accuracy results when the system was applied to other domains due to the absence of entities at the lexical resources.

Related Papers

There are only two other related papers for NER on Turkish texts. One of them Cucerzan and Yarowsky, SIGDAT 1999 used language-independent bootstrapping algorithm and the other one Tur et al, NLEJ 2003 used statistical methods.