Nlao writeup of McCallum 2000
This is a review of Frietag_2000_Maximum_Entropy_Markov_Models_for_Information_Extraction_and_Segmentation by user:Nlao.
I have concern about why MEMM works better than HMM. The authors claim that this is because "MEMM allows state transition to depend on non-independent features". But I would rather think it as the difference of generative models and descriminative models.
HMM tries to model the distribution of features conditioned on states, which is a much harder task than directly modeling distribution of the states (as in MEMM). Since there are more features than states, HMM is more likely to overfit data.
Furthermore, I don't see how "non-independent features" are modeled in the MEMM. The 24 boolean features are feed to maxEnt model (which is a linear model), and no combinations of features is explored.
[minor points]
- wonder if there is any follow up work with distributed stat representations
- Good question - I don't know of any, at least that's been described as such.... - Wcohen 14:31, 24 September 2009 (UTC)