Rbalasub writeup of Cohen and Carvalho
A review of Cohen_2005_stacked_sequential_learning by user:rbalasub
The paper proposes stacked learning, a meta learning method. Experiments by the authors show that it improves on sequential labeling task over traditional sequential models like CRF and MEMM. In sequential models, the label of the previous symbol in the sequence is used to predict the label of the current symbol. There is however a problem; while the training algorithm has access to the true labels of all symbols, during test time, the decoder has access only to the predicted labels. This often leads to the trainer assigning high weights to the history feature which can be detrimental during test time when the history feature is noisy (since it's a prediction). Stacked learning tackles this problem by training on predicted labels of previous symbols instead of actual labels. A cross validation like procedure is used to simulate test-time conditions.
Questions
- Why does MEMM perform worse than ME. Isn't it the same as ME with one additional feature? Is it the Viterbi decoding?
- In the experiment with adding noise, how does the MEMM improve error rate?
- In s-CRF, there will two types of history features? One from the extended history and second, from the natural history features that CRF constructs.