Selen writeup of Cohen 2005

From Cohen Courses
Jump to navigationJump to search

This is a review of Cohen_2005_stacked_sequential_learning by user:Selen.

In this paper, they apply stacking to a sequential learning problem. Problem is certain methods may perform bad on the test data, even though they perform well in training, and in tasks such as sequential learning, the sequential properties of the data is often ignored. Their method is the following: they first split the data into K partitions, train a base learner (in this case ME and CRF), on the K-1 partitions, they apply the model to the holdout data, augment the predictions to the original data and retrain.

As an answer to your question, they increase the running time of naive bayes with K = 10 by 12 since they have an initial and final classifier and 10 classifiers for the cross validation part.

I think this is a pretty elegant idea, I wonder if it can be used to other tasks such as TFBS finding, since they are often found sequentially in the upstream region of the genome.