Mota and Grishman, ACL-IJCNLP 2009

From Cohen Courses
Revision as of 11:40, 25 October 2010 by PastStudents (talk | contribs) (Created page with '== Citation == C. Mota and R. Grishman 2009. Updating a Name Tagger Using Contemporary Unlabeled Data. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. == Online…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

C. Mota and R. Grishman 2009. Updating a Name Tagger Using Contemporary Unlabeled Data. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers.

Online version

ACL Anthology

Summary

This paper investigates the performance of using a semi-supervised Name Entity Tagger, that has been trained on a data from an earlier time period, over a contemporary unlabeled data. They used co-training approach with seeds and unlabeled data. They have experimented on CETEMPublico data set which is a Portuguese journal corpus of 8 years of news text with the time span of 6 months.

The authors performed experiments in order to answer two questions:

  • Is it better to update the seed or the unlabeled data?

The experiments proved that using seeds from the same time period with test does not help as much as using unlabeled data from the period of the test data set. A close analysis showed that training with contemporary unlabeled data improves the classification of tags.

  • Is it better to use large amounts of older unlabeled data.

It has been observed that increasing the size of the unlabeled data does not always improve the performance.

Using unlabeled contemporary data outperforms using larger amount of older unlabeled data or using contemporary seeds. Therefore there is no need to label new data or use more training data.

Related Papers

An earlier paper Seymore et al, AAAI 1999, also used HMM and represented different type of entities in the same model. They showed that an HMM with multiple states per entity outperforms an HMM with one state per entity.