Difference between revisions of "Nlao writeup of McCallum 2000"

From Cohen Courses
Jump to navigationJump to search
 
m (1 revision)
 
(No difference)

Latest revision as of 10:42, 3 September 2010

This is a review of Frietag_2000_Maximum_Entropy_Markov_Models_for_Information_Extraction_and_Segmentation by user:Nlao.

I have concern about why MEMM works better than HMM. The authors claim that this is because "MEMM allows state transition to depend on non-independent features". But I would rather think it as the difference of generative models and descriminative models.

HMM tries to model the distribution of features conditioned on states, which is a much harder task than directly modeling distribution of the states (as in MEMM). Since there are more features than states, HMM is more likely to overfit data.

Furthermore, I don't see how "non-independent features" are modeled in the MEMM. The 24 boolean features are feed to maxEnt model (which is a linear model), and no combinations of features is explored.

[minor points]

- wonder if there is any follow up work with distributed stat representations

  • Good question - I don't know of any, at least that's been described as such.... - Wcohen 14:31, 24 September 2009 (UTC)