Nlao writeup of McCallum 2000

From Cohen Courses
Jump to navigationJump to search

This is a review of Frietag_2000_Maximum_Entropy_Markov_Models_for_Information_Extraction_and_Segmentation by user:Nlao.

I have concern about why MEMM works better than HMM. The authors claim that this is because "MEMM allows state transition to depend on non-independent features". But I would rather think it as the difference of generative models and descriminative models.

HMM tries to model the distribution of features conditioned on states, which is a much harder task than directly modeling distribution of the states (as in MEMM). Since there are more features than states, HMM is more likely to overfit data.

Furthermore, I don't see how "non-independent features" are modeled in the MEMM. The 24 boolean features are feed to maxEnt model (which is a linear model), and no combinations of features is explored.

[minor points]

- wonder if there is any follow up work with distributed stat representations

  • Good question - I don't know of any, at least that's been described as such.... - Wcohen 14:31, 24 September 2009 (UTC)