Difference between revisions of "Cohen and Carvalho, 2005"

From Cohen Courses
Jump to navigationJump to search
Line 14: Line 14:
 
This technique is evaluated on the problem of recognizing the signature section of an email. The dataset contains 617 emails. Each email is represented by a vector of feature vectors. Each element of this vector is a feature vector for each line of the email. A line is labeled as positive if it is part of the email signature, otherwise it is labeled as negative. Hand-crafted features (e.g. "line is blank") are used to represent each line of the email. The dataset totally contains 33,013 lines which about 10% are labeled as positive. They have also tested their technique on additional segmentation tasks such as classifying lines from FAQ documents, video segmentation, etc.
 
This technique is evaluated on the problem of recognizing the signature section of an email. The dataset contains 617 emails. Each email is represented by a vector of feature vectors. Each element of this vector is a feature vector for each line of the email. A line is labeled as positive if it is part of the email signature, otherwise it is labeled as negative. Hand-crafted features (e.g. "line is blank") are used to represent each line of the email. The dataset totally contains 33,013 lines which about 10% are labeled as positive. They have also tested their technique on additional segmentation tasks such as classifying lines from FAQ documents, video segmentation, etc.
  
Given a sample S, the algorithm first segments S into K equal-sized disjoint subsets (<math>S_1,...,S_k</math>) and learn K functions <math>f_1,...,f_k</math>. Each function <math>f_i</math> is trained on all data except the i'th subset <math>S_i</math>. We then construct set <math>S'=\{(x_t,y'_t) : y'=f_j(x_t)\ and\ x_t \in S_j\}</math>. This is the basis of sequential stacking algorithm. Set <math>S'</math> is then used to create a new dataset of extended instances. An extended instance in a simplest case is a vector composed of an instance <math>(x_i,y'_{i-1})</math> where <math> y'_{i-1} </math> is the (i-1)-th label in <math>y'</math>.
+
Given a sample S, the algorithm first segments S into K equal-sized disjoint subsets (<math>S_1,...,S_k</math>) and learn K functions <math>f_1,...,f_k</math>. Each function <math>f_i</math> is trained on all the data except the i'th subset <math>S_i</math>. We then construct set <math>S'=\{(x_t,y'_t) : y'=f_j(x_t)\ and\ x_t \in S_j\}</math>. This is the basis of sequential stacking algorithm. Set <math>S'</math> is then used to create a new dataset of extended instances. An extended instance in a simplest case is a vector composed of an instance <math>(x_i,y'_{i-1})</math> where <math> y'_{i-1} </math> is the (i-1)-th label in <math>y'</math>.
  
 
In the initial results they have shown that stacked sequential learning can reduce error of [[method::Maximum Entropy]] ME technique from 3.20% to 2.63%. They have also shown that they can achieve error rate of 0.71% by choosing window size of 20 in [[method::s-ME]] technique. Comparing to error rate of CRF which is 1.17% this is statistically significant improvment.  
 
In the initial results they have shown that stacked sequential learning can reduce error of [[method::Maximum Entropy]] ME technique from 3.20% to 2.63%. They have also shown that they can achieve error rate of 0.71% by choosing window size of 20 in [[method::s-ME]] technique. Comparing to error rate of CRF which is 1.17% this is statistically significant improvment.  
  
 
== Related papers ==
 
== Related papers ==

Revision as of 06:01, 1 November 2010

Citation

Cohen, W. W., & Carvalho, V. (2005). Stacked sequential learning. In Proceedings of the international joint conference on artificial intelligence (IJCAI).

Online version

[[1]]

Summary

This paper introduces a novel schema for stacked sequential learning. Stacked sequential learning is a meta-learning technique that takes an arbitrary base learner and improves its performance by making it aware of nearby examples. In the experiments they have shown that stacked sequential technique improves performance of both sequential (e.g. conditional random field) and non-sequential algorithms. This technique can be applied on any base learner and imposes only a constant overhead in training time.

This technique is evaluated on the problem of recognizing the signature section of an email. The dataset contains 617 emails. Each email is represented by a vector of feature vectors. Each element of this vector is a feature vector for each line of the email. A line is labeled as positive if it is part of the email signature, otherwise it is labeled as negative. Hand-crafted features (e.g. "line is blank") are used to represent each line of the email. The dataset totally contains 33,013 lines which about 10% are labeled as positive. They have also tested their technique on additional segmentation tasks such as classifying lines from FAQ documents, video segmentation, etc.

Given a sample S, the algorithm first segments S into K equal-sized disjoint subsets () and learn K functions . Each function is trained on all the data except the i'th subset . We then construct set . This is the basis of sequential stacking algorithm. Set is then used to create a new dataset of extended instances. An extended instance in a simplest case is a vector composed of an instance where is the (i-1)-th label in .

In the initial results they have shown that stacked sequential learning can reduce error of Maximum Entropy ME technique from 3.20% to 2.63%. They have also shown that they can achieve error rate of 0.71% by choosing window size of 20 in s-ME technique. Comparing to error rate of CRF which is 1.17% this is statistically significant improvment.

Related papers