Philgoo Han writeup of Collins

From Cohen Courses
Jump to navigationJump to search

This is a review of Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, Collins, EMNLP 2002 by user:Ironfoot.

  • Training HMMs with perceptron algorithm
    • Probability structure as HMM
    • Estimate function as maxent
    • Update as perceptron
  • Is there any reason two states from history are used, window size of 20 showed much less error Cohen(2005)
  • Using average(or weighted average) performs beter than only using the last step parameter. Not intuitive but surprising
  • Comparing results with other models(All the HMM below, CRF, voting-perceptron etc) will be interesting


  • We have seen four variations of HMM so far
    • HMM trained with joint likelihood
    • HMM trained with conditional likelihood
    • Maxent HMM - modification in structure
    • HMM with perceptron alg.