Cohen and Carvalho, 2005

From Cohen Courses
Jump to navigationJump to search


Cohen, W. W., & Carvalho, V. (2005). Stacked sequential learning. In Proceedings of the international joint conference on artificial intelligence (IJCAI).

Online version



This paper introduces a novel schema for stacked sequential learning. Stacked sequential learning is a meta-learning technique that takes an arbitrary base learner and improves its performance by making it aware of nearby examples. In the experiments they have shown that stacked sequential technique improves performance of both sequential (e.g. conditional random field) and non-sequential algorithms. This technique can be applied on any base learner and imposes only a constant overhead in training time.

This technique is evaluated on the problem of recognizing the signature section of an email. The dataset contains 617 emails. Each email is represented by a vector of feature vectors. Each element of this vector is a feature vector for each line of the email. A line is labeled as positive if it is part of the email signature, otherwise it is labeled as negative. Hand-crafted features (e.g. "line is blank") are used to represent each line of the email. The dataset totally contains 33,013 lines which about 10% are labeled as positive. They have also tested their technique on additional segmentation tasks such as classifying lines from FAQ documents, video segmentation, etc.

Given a sample S, the algorithm first segments S into K equal-sized disjoint subsets () and learn K functions . Each function is trained on all the data except the i'th subset . We then construct set . This is the basis of sequential stacking algorithm. Set is then used to create a new dataset of extended instances. An extended instance in a simplest case is a vector composed of an instance where is the (i-1)-th label in .

In the initial results they have shown that stacked sequential learning can reduce the error of Maximum Entropy (ME) technique from 3.20% to 2.63%. They have also shown that they can achieve error rate of 0.71% by choosing window size of 20 in s-ME technique. Comparing to error rate of CRF which is 1.17% this is a statistically significant improvment.

Related papers