Difference between revisions of "R. K. Ando and T. Zhang. ACL 2005"

Revision as of 17:08, 31 October 2010

Citation

R. K. Ando & T. Zhang, A High-Performance Semi-Supervised Learning Method for Text Chunking, in ACL 2005

Online version

Summary

This paper investigates a new semi-supervised learning method that addresses the problem of NE chunking and syntactic chunking.

To utilize the unlabeled data, they created numerous auxiliary problems related to the target task and train classifiers for each of those. Then they learn the common predictive structure shared by those problems. The authors argued that such common structure can be used to improve the result of target task. One example of such auxiliary problem is: predict whether a word is "IBM" or not from its context. This is related to NE chunking since knowing a word is "IBM" helps to predict whether its part of a name.

structural learning

used a structural learning introduced in [[]]. the goal is to find a low dimension predictive structure shared by all auxiliary problems. to solve alternating structure optimization (ASO) in which the maximal commonality of all predictors can be captured from left singular vectors given by a SVD of the predictor matrix. find principle components in the predictor space while PCA seeks in data space.

Difference between revisions of "R. K. Ando and T. Zhang. ACL 2005"

Revision as of 17:08, 31 October 2010

Contents

Citation

Online version

Summary

structural learning

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 17: / Line 17: @@
 One example of such auxiliary problem is: predict whether a word is "IBM" or not from its context. This
 is related to NE chunking since knowing a word is "IBM" helps to predict whether its part of a name.
+== structural learning ==
+used a structural learning introduced in [[]]. the goal is to find a low dimension predictive structure
+shared by all auxiliary problems. to solve alternating structure optimization ([[UsesMethod::ASO]]) in
+which the maximal commonality of all predictors can be captured from left singular vectors given by a SVD
+of the predictor matrix.
+find principle components in the predictor space while [[UsesMethod::PCA]] seeks in data space.