Stacked Sequential Learning
This is a meta-learning method that deals with the mismatch between training and testing data for sequential models, proposed in Cohen and Carvalho, 2005. It stacks two stages of prediction, where the second stage makes use of the results of the first stage.
Motivation
Consider the general form of sequential prediction, in which we need to predict the label sequence given the observation sequence . The prediction of one label will depend on neighboring labels, typically and . During training, we have the true neighboring labels; but during testing, will be predicted based on the predicted neighboring labels. Due to reasons such as assumptions made by the model that do not exactly match the reality, there will be a mismatch between the distribution of the true and predicted neighboring labels, and this mismatch can result in degraded performance.
The solution is a two-stage approach: in the first stage, we train a base classifier using predicted labels instead of true labels; in the second stage, we train another classifier that learns from the mistakes made by the first classifier. The predicted labels for the training data are obtained with cross validation.