Difference between revisions of "Soft Supervised Text Classification"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
 
== Citation ==
 
== Citation ==
{{MyCiteconference | booktitle = Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) | coauthors =  J.Bilmes | date = 2008| first = A.| last = Subramanya| title = ESoft-Supervised Learning for Text Classification| url =http://ssli.ee.washington.edu/~bilmes/mypubs/subramanya2008-ssltc.pdf}}
+
{{MyCiteconference | booktitle = Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) | coauthors =  J.Bilmes | date = 2008| first = A.| last = Subramanya| title = Soft-Supervised Learning for Text Classification| url =http://ssli.ee.washington.edu/~bilmes/mypubs/subramanya2008-ssltc.pdf}}
  
 
This [[Category::Paper]] is available online [http://ssli.ee.washington.edu/~bilmes/mypubs/subramanya2008-ssltc.pdf].
 
This [[Category::Paper]] is available online [http://ssli.ee.washington.edu/~bilmes/mypubs/subramanya2008-ssltc.pdf].
  
== Online ==
+
== Background ==
 +
 
 +
Graph-based semi-supervised learning (SSL) approaches aim to predict labels over unlabeled data, using the labeled datapoints and the structure of the graph.  Often, the graph structure is represented by the graph Laplacian.  In common formulations, the Laplacian is in a regularization term (to ensure smoothness over the graph when labeling unlabeled points), and the loss function (commonly quadratic loss) tries to ensure unlabeled points close to labeled points have similar labels as the labeled data. 
  
 
== Summary ==  
 
== Summary ==  
 +
 +
This work concentrates on graph-based semi-supervised learning for cases where a datapoint can have multiple labels associated with it, for example in document classification, where a particular document can have multiple topic labels.  Existing graph-based approaches are often sub-optimal in this regard, since many assume binary classification tasks and then generalize to multi-class problems by using some sort of one vs. rest strategy for classification, which means that the different classifiers are trained independent of one another. 
 +
 +
This work attempts to solve that problem by proposing a graph-based SSL framework, where the loss function is based on minimizing the Kullback-Leibler Divergence between two distributions, and the regularizer is based on entropy (maximization).  Using probability distributions on the vertices is better than existing approaches that use fixed integer labels (or relaxations consisting of real values in the <math>[0, 1]</math> interval), because we can generalize to multi-class classification easier, and also allows us to use information theory-based machinery, which the authors argue (through their results) is superior to Euclidean-based methods. 
  
 
== Main Approach ==
 
== Main Approach ==
 +
 +
First some SSL-related preliminaries. 
  
 
== Baseline & Results ==
 
== Baseline & Results ==
  
 
== Related Work ==
 
== Related Work ==

Revision as of 19:59, 25 September 2011

Citation

Soft-Supervised Learning for Text Classification, by A. Subramanya, J.Bilmes. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008.

This Paper is available online [1].

Background

Graph-based semi-supervised learning (SSL) approaches aim to predict labels over unlabeled data, using the labeled datapoints and the structure of the graph. Often, the graph structure is represented by the graph Laplacian. In common formulations, the Laplacian is in a regularization term (to ensure smoothness over the graph when labeling unlabeled points), and the loss function (commonly quadratic loss) tries to ensure unlabeled points close to labeled points have similar labels as the labeled data.

Summary

This work concentrates on graph-based semi-supervised learning for cases where a datapoint can have multiple labels associated with it, for example in document classification, where a particular document can have multiple topic labels. Existing graph-based approaches are often sub-optimal in this regard, since many assume binary classification tasks and then generalize to multi-class problems by using some sort of one vs. rest strategy for classification, which means that the different classifiers are trained independent of one another.

This work attempts to solve that problem by proposing a graph-based SSL framework, where the loss function is based on minimizing the Kullback-Leibler Divergence between two distributions, and the regularizer is based on entropy (maximization). Using probability distributions on the vertices is better than existing approaches that use fixed integer labels (or relaxations consisting of real values in the interval), because we can generalize to multi-class classification easier, and also allows us to use information theory-based machinery, which the authors argue (through their results) is superior to Euclidean-based methods.

Main Approach

First some SSL-related preliminaries.

Baseline & Results

Related Work