Difference between revisions of "C. Mota and R. Grishman. ACL-IJCNLP 2009"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == C. Mota & R. Grishman. Updating a Name Tagger Using Contemporary Unlabeled Data. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, 2009. == Online …')
 
 
(5 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
== Online version ==
 
== Online version ==
  
[http://www.aclweb.org/anthology/P/P09/P09-2089.pdf]
+
[http://www.aclweb.org/anthology/P/P09/P09-2089.pdf updating name tagger using unlabeled data]
  
 
== Summary ==
 
== Summary ==
 +
 +
In this [[Category::paper]], the authors tried to discover the roles of unlabeled data from different periods
 +
in [[UsesMethod::semi-supervised learning]] on [[AddressesProblem::Name Entity Tagging]] task.
 +
 +
They used a name tagger described in [[RelatedPaper::Mota and Grishman, LREC 2008]],
 +
which is based on [[UsesMethod::co-training]] NE classifier.
 +
 +
In the experiments, they use [[UsesDataset::CETEMPublico]] dataset.
 +
The test sets are fixed and drawn from the most recent epoch and they vary the seeds set or the unlabeled data from different epochs.
 +
 +
They found out:
 +
# In training, adding more recent unlabeled data outperforms the strategy of adding contemporary labeled data.
 +
# Adding more older unlabeled data did not improve the performance compared with adding a smaller set of contemporary unlabeled data.

Latest revision as of 15:06, 31 October 2010

Citation

C. Mota & R. Grishman. Updating a Name Tagger Using Contemporary Unlabeled Data. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, 2009.

Online version

updating name tagger using unlabeled data

Summary

In this paper, the authors tried to discover the roles of unlabeled data from different periods in semi-supervised learning on Name Entity Tagging task.

They used a name tagger described in Mota and Grishman, LREC 2008, which is based on co-training NE classifier.

In the experiments, they use CETEMPublico dataset. The test sets are fixed and drawn from the most recent epoch and they vary the seeds set or the unlabeled data from different epochs.

They found out:

  1. In training, adding more recent unlabeled data outperforms the strategy of adding contemporary labeled data.
  2. Adding more older unlabeled data did not improve the performance compared with adding a smaller set of contemporary unlabeled data.