Philgoo Han writeup of Cohen and Carvalho

Meta learning algorithm for any base learner
High error rate for MEMM (for the particular training data)
- Different from label bias or observation bias. Due to too strong weight on history feature.
- Since training and test data are from the same source the two should have same characterisitcs. One reason I can think of the great error rate is a single false prediction makes all the following text line flase in MEMM. But can this single reason magnify the error rate more than ten times bigger?
Sequential stacking
- Making a prediction of y
- Why is "y^ similar to the prediction produced by an f learned by A on a size-m sample that does not include x"?
- I don't get the meaning of "f will not be used as the inner loop of a Viterbi or beam-search"
Result
- Much lower error rate. Using moderately large window(history) size improves the precision.
- This may be an implementation issue. How do you handle the boundary conditions where there is not enough history or future states
Also work well on different data.
- It seems that sequential stacking algorithm can improve CMM bias problem in general.

Navigation menu