Difference between revisions of "R. K. Ando and T. Zhang. ACL 2005"

From Cohen Courses
Jump to navigationJump to search
 
(7 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
[[AddressesProblem::NE chunking]] and [[AddressesProblem::syntactic chunking]].
 
[[AddressesProblem::NE chunking]] and [[AddressesProblem::syntactic chunking]].
  
To utilize the unlabeled data, they created numerous auxiliary problems related to the target task and
+
In this paper, I think NE chunking and NE tagging are the same task.
 +
They both involve identifying the extent and type of each name in the text. This can be reformulated
 +
as a task of assigning a tag to each token by using BIO tags.
 +
 
 +
Using the unlabeled data they created numerous auxiliary problems related to the target task and
 
train classifiers for each of those.
 
train classifiers for each of those.
 
Then they learn the common predictive structure shared by those problems. The authors argued that
 
Then they learn the common predictive structure shared by those problems. The authors argued that
Line 17: Line 21:
 
One example of such auxiliary problem is: predict whether a word is "IBM" or not from its context. This  
 
One example of such auxiliary problem is: predict whether a word is "IBM" or not from its context. This  
 
is related to NE chunking since knowing a word is "IBM" helps to predict whether its part of a name.
 
is related to NE chunking since knowing a word is "IBM" helps to predict whether its part of a name.
 +
 +
They used the structural learning algorithm [[UsesMethod::SVD-ASO]] to learn such predictive structure.
 +
 +
To evaluate their algorithm on chunking tasks they compared it with several baseline algorithms such as
 +
supervised classifier (same model without using unlabeled data), [[UsesMethod::co-training]], [[UsesMethod::self-training]].
 +
 +
They performed NE chunking on the [[UsesDataset::CoNLL'03]] dataset and syntactic chunking on
 +
[[UsesDataset::CoNLL'00]] dataset. They showed superior performance than previous systems and methods.
 +
 +
== Structural Learning ==
 +
Within their model, they assume there exists a low dimension predictive structure shared by multiple prediction problems.
 +
To learn such structure they
 +
used a structural learning algorithm that is first introduced in [[RelatedPaper::Ando and Zhang 2004]].
 +
This algorithm is similar to coordinate descent in a sense that in each iteration they either fix predictors and find
 +
the optimal predictive structure
 +
or fix predictive structure and find predictors to minimize the joint empirical risk.
 +
 +
They also explained the difference between this algorithm and [[UsesMethod::PCA]], where
 +
[[UsesMethod::SVD-ASO]] find principle components in the predictor space while [[UsesMethod::PCA]] seeks in data space.

Latest revision as of 05:46, 23 November 2010

Citation

R. K. Ando & T. Zhang, A High-Performance Semi-Supervised Learning Method for Text Chunking, in ACL 2005

Online version

SSL for text chunking

Summary

This paper investigates a new semi-supervised learning method that addresses the problem of NE chunking and syntactic chunking.

In this paper, I think NE chunking and NE tagging are the same task. They both involve identifying the extent and type of each name in the text. This can be reformulated as a task of assigning a tag to each token by using BIO tags.

Using the unlabeled data they created numerous auxiliary problems related to the target task and train classifiers for each of those. Then they learn the common predictive structure shared by those problems. The authors argued that such common structure can be used to improve the result of target task. One example of such auxiliary problem is: predict whether a word is "IBM" or not from its context. This is related to NE chunking since knowing a word is "IBM" helps to predict whether its part of a name.

They used the structural learning algorithm SVD-ASO to learn such predictive structure.

To evaluate their algorithm on chunking tasks they compared it with several baseline algorithms such as supervised classifier (same model without using unlabeled data), co-training, self-training.

They performed NE chunking on the CoNLL'03 dataset and syntactic chunking on CoNLL'00 dataset. They showed superior performance than previous systems and methods.

Structural Learning

Within their model, they assume there exists a low dimension predictive structure shared by multiple prediction problems. To learn such structure they used a structural learning algorithm that is first introduced in Ando and Zhang 2004. This algorithm is similar to coordinate descent in a sense that in each iteration they either fix predictors and find the optimal predictive structure or fix predictive structure and find predictors to minimize the joint empirical risk.

They also explained the difference between this algorithm and PCA, where SVD-ASO find principle components in the predictor space while PCA seeks in data space.