Mnduong writeup of Collins 2002

This paper introduced a discriminative method to train parameters in a sequential model for tagging. The method is similar to the perceptron method discussed by Freund & Schapire 1999. Each position in the sequence has local features that are similar to those in maximum entropy models. When these local features are binary, global features are defined as the sum of all the similar local features at different positions and are effectively counts. Each global feature is then associated with a parameter that needs to be trained.
The training algorithm is iterative. At every iteration, the parameters are updated based on the number of occurrence of the corresponding trigrams in the sequence of true tags as compared to those in the prediction. After the last iteration, the algorithm averages the parameters over all iterations instead of taking the values at the last iteration.
In the evaluation section, I would like to see the result of this model compared to CRFs, and even traditional HMMs, since the model is supposed to be similar to HMM, the only difference lying in how the parameters are trained.
I'm not sure why this is considered a variation of HMMs. To me, the model is closer to the MaxEnt model, with different training criteria and method.

Navigation menu