Difference between revisions of "Tur et al, NLEJ 2003"

From Cohen Courses
Jump to navigationJump to search
Line 19: Line 19:
 
   
 
   
 
* Name Tagging
 
* Name Tagging
The authors combine 4 models for this task.
+
** Lexical Model : An HMM model of word/tag combination used to catch the lexical information.
** Lexical Model : An HMM model of word/tag combinations used to catch the lexical information.
 
 
** Contextual Model : This model helps tagging unknown words by using the context clues around it.  
 
** Contextual Model : This model helps tagging unknown words by using the context clues around it.  
 
** Morphological Model :This model uses the morphological information to catch the proper nouns.
 
** Morphological Model :This model uses the morphological information to catch the proper nouns.
 
** Name Tag Model : This model favors correct probable tagging sequences by using the name tag information.
 
** Name Tag Model : This model favors correct probable tagging sequences by using the name tag information.

Revision as of 16:14, 2 October 2010

Citation

Tür, G., Hakkani-Tür, D., Oflazer, K. 2003. A Statistical Information Extraction System for Turkish. Natural Language Engineering 9(2), 181–210

Online version

CiteSeerX

Summary

Turkish is an agglutinative language which enables the production of thousands of word forms from a given root. This structure of Turkish results in data sparseness issues which at the end decrease the effectiveness of statistical methods. In order to deal with this problem, researchers work with the morphological form of the word instead of the surface form.

This paper is important since this is the first work which uses statistical methods in IE tasks for Turkish. In this paper the authors focus on three subtopics of IE.

  • Sentence Segmentation

In the paper, the authors reduces the sentence segmentation problem to a boundary classification problem where each word is followed by a boundary flag which denotes whether there is a sentence boundary or not. Their input lack the punctuation or case. They use a model which combines the language model of surface form and LM of final inflectional form from morphological form of the word.

  • Topic Segmentation

Similar to the sentence boundaries, a topic boundary approache is used here. LMs are created after clustering the topics. The authors started with word-based model, and generalized by creating a stem-based model. Noun-based model which is the most general approach got the best accuracy.

  • Name Tagging
    • Lexical Model : An HMM model of word/tag combination used to catch the lexical information.
    • Contextual Model : This model helps tagging unknown words by using the context clues around it.
    • Morphological Model :This model uses the morphological information to catch the proper nouns.
    • Name Tag Model : This model favors correct probable tagging sequences by using the name tag information.