Difference between revisions of "R. Ghani. ICML 2002"

Revision as of 16:23, 30 November 2010

Citation

R. Ghani. Combining Labeled and Unlabeled Data for MultiClass Text Categorization. In Proceedings of ICML, 2002.

Online version

Summary

This paper presents a new semi-supervised learning algorithm.

It decomposes multi-class classification problem into n binary ones using ECOC and Co-training is used for learning each individual binary classifier.

Hoovers dataset that contains over 108,000 web pages of different companies. Since there are no natural feature split, the author randomly split the vocabulary into two halves and treat them as two separate feature sets. Another dataset used for experiments is Jobs dataset. Job titles and job description are used separate feature sets for Co-training.

@@ Line 10: / Line 10: @@
 This paper presents a new semi-supervised learning algorithm.
-First it decomposes multi-class classification problem into n binary ones using [[UsesMethod::Error correcting output coding|ECOC]]
+It decomposes multi-class classification problem into n binary ones using [[UsesMethod::Error correcting output coding|ECOC]]
-and Co-training is used for learning each individual binary classifier
+and [[UsesMethod::Co-training]] is used for learning each individual binary classifier.
+[[UsesDataset::Hoovers]] dataset that contains over 108,000 web pages of different companies. Since there are no natural feature split,
+the author randomly split the vocabulary into two halves and treat them as two separate feature sets.
+Another dataset used for experiments is [[UsesDataset::Jobs]] dataset.
+Job titles and job description are used separate feature sets for Co-training.

Difference between revisions of "R. Ghani. ICML 2002"

Revision as of 16:23, 30 November 2010

Citation

Online version

Summary

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools