Voted Perceptron
The voted perceptron method is based on the perceptron algorithm of Rosenblatt and Frank. The algorithm takes advantage of data that are linearly separable with large margins. This method is simpler to implement, and much more efficient in terms of computation time as compared to Vapnik's SVM. The algorithm can also be used in very high dimensional spaces using kernel functions.
Manabu Sassano, IJCNLP performed an experimental comparison of voted perceptron with SVM on classification tasks in NLP and observed that the voted perceptron is comparable to SVM in terms of accuracy and, in addition, as to learning time and prediction speed of voted perceptron is considerable better than SVM.
Summary
We assume that all instances are points , denotes Euclidean length of and the labels are in {-1, +1}. The online perceptron algorithm starts with an initial zero prediction vector . It predicts the label of a new instance to be . If this prediction differs from the label , it updates the prediction vector to . If the prediction is correct then is not changed. The process then repeats with the next example. It has been shown that if the data are linearly separable, then the perceptron algorithm will make a finite number of mistakes, and therefore, if repeatedly cycled through the training set, will converge to a vector which correctly classifies all of the examples. Moreover, the number of mistakes is upper bounded by a function of the gap between the positive and negative examples. In the voted-perceptron algorithm, we store more information during training and then use this elaborate information to generate better predictions on the test data. The information we maintain during training is the list of all prediction vectors that were generated after each and every mistake. For each such vector, we count the number of iterations it survives until the next mistake is made; we refer to this count as the weight of the prediction vector. To calculate a prediction we compute the binary prediction of each one of the prediction vectors and combine all these predictions by a weighted majority vote. The weights used are the survival times described above. This makes intuitive sense as good prediction vectors tend to survive for a long time and thus have larger weight in the majority vote.
Here is the algorithm