Reyyan project abstract

From Cohen Courses
Revision as of 13:22, 29 September 2010 by PastStudents (talk | contribs)
Jump to navigationJump to search

Summary

In this project, I am going to develop an information extraction system for Turkish. There are only a couple of studies that worked on NER on Turkish. One of them used Rule-based methods and another one used statistical methods. I am planing to apply more recent methods, such as CRF to Turkish texts.

Data

I am going to use the same training data that has been used in one of the previous studies. The data consists of news articles and contains person, location and organization tags.

I also have a parallel English-Turkish corpus. I can use bootstrap method to tag this data. Another idea that can work is tagging Turkish side of the data by matching the Turkish and English entities with their dependency parse trees.

Motivation

As a Turkish student, I want to apply what I have learned in this course to Turkish domain. One encounters different challenges while working with Turkish. I want to see the effect of these differences on the NER task.

Superpowers

I know Turkish which I believe is a good starting point.

Team Members

Reyyan Yeniterzi