Liuy writesup Cohen 2005 stacked sequential learning
This is a review of Cohen_2005_stacked_sequential_learning by user:Liuy.
The paper uses cross-validation to handle the mismatch between the training data (true labels) and testing data (predicted labels), due to the correlations between adjacent labels. As a meta-learning algorithm, sequential staking makes any basic learner to realize the labels of nearby examples. It is interesting to see that empirically, sequential stacking does better than non-sequential baselines; and also outperforms CRF and max-ent. I like the trick of doing stacking on top of MEMMs. It dramatically improves the empirical performance of MEMMs. It seems that algorithms using stacking tricks does better than not using that on NEQ; I am not sure on other problems in NLP, how that performs ?
'If you used stacked sequential learning with naive Bayes as the base learning and K=10, how many times slower would it be than just running the base learner?' My answer is that using stacked sequential learning with naive Bayes would take K+2 = 12 times as long as that of training a single Naive Bayes leaner. It takes K time to train classifiers, and 1 time on the dataset, and one 1 time on the extended dataset.