Mnduong writeup of McCallum et al.

This paper introduced a new method for modeling sequential data - Maximum Entropy Markov Models. The method was meant to improve a few shortcomings of HMMs. Namely, it allows for a richer representation of the observation in terms of overlapping features (e.g. POS, capitalization of word) instead of just the identity of the observation (e.g. word identity). Instead of estimating a joint distribution as HMMs do, MEMMs looks at the conditional probability of each state, given its preceding state AND the observation at this state.
The state-observation transition function is trained based on the maximum entropy framework and takes the exponential form. Parameters are estimated using the Generalized Iterative Scaling algorithm.
The method was evaluated in the task of segmenting Q&A. It outperformed baselines that include a stateless ME model, a traditional token HMM and a feature HMM that converted each line to a sequence of features, whose firings yielded symbols that were to be emitted by the HMM.

The paper gave a very thorough and convincing motivation for the method it proposed.
Sufficient details of the new model were given.
I also liked the variations that the authors gave to address the various problems, such as data sparseness...

Navigation menu