Mnduong writeup of Cohen & Carvalho

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Cohen_2005_stacked_sequential_learning by user:mnduong.

  • This paper introduced stacked sequential learning, a meta-learning method that can be built upon any learner to solve the sequential partitioning task, where there are long runs of identical labels.
  • The method is motivated by MEMM's low accuracy in this task. The errors in MEMM were determined to come from its local models putting too much weight on history labels, which in turn was due to a discrepancy between the training and testing set. In testing, the predicted values of history labels are used, instead of their true labels.
  • To tackle this problem, the method also uses predicted values for the history labels, which were computed in a cross validation scheme. The method can be generalized to include any number of history and future labels in the local models. Increasing the window size reduced the error rate, but also increased training time, not surprisingly.
  • Stacked sequential learning using maximum entropy as the base learner was found to outperform MEMM when only 1 history label and no future label is used. With 20 history and 20 future labels, it outperformed CRF. When used with CRF as the based learner, the method also outperformed CRF.


  • If you used stacked sequential learning with naive Bayes as the base learning and K=10, how many times slower would it be than just running the base learner?
  • Ignoring the difference in training the cross validated sets and the full sets, it should be K+2 = 12 times slower than just running naive Bayes.