Suranah writeup for Cohen 2005
From Cohen Courses
Jump to navigationJump to searchThis is a review of the paper Cohen_2005_stacked_sequential_learning by user:Suranah.
I found the fact that how the history feature in MEMM can substantially overfit to be quite a revelation. It was interesting to know that this is not so much due to overfitting in the conventional sense, but due to the difference in accuracy of history labels for testing and training. Until I attended the lecture, I was unable to find a natural intuition behind this approach as we were not testing the stacked approach on MEMM itself. But, this became more clear after the lecture.
And the answer is K+2 for any base learner, where K is the number of subsets partitioned/ functions learned.