Mnduong writeup of McCallum et al.
From Cohen Courses
Jump to navigationJump to searchThis is a review of Frietag_2000_Maximum_Entropy_Markov_Models_for_Information_Extraction_and_Segmentation by user:mnduong.
- This paper introduced a new method for modeling sequential data - Maximum Entropy Markov Models. The method was meant to improve a few shortcomings of HMMs. Namely, it allows for a richer representation of the observation in terms of overlapping features (e.g. POS, capitalization of word) instead of just the identity of the observation (e.g. word identity). Instead of estimating a joint distribution as HMMs do, MEMMs looks at the conditional probability of each state, given its preceding state AND the observation at this state.
- The state-observation transition function is trained based on the maximum entropy framework and takes the exponential form. Parameters are estimated using the Generalized Iterative Scaling algorithm.
- The method was evaluated in the task of segmenting Q&A. It outperformed baselines that include a stateless ME model, a traditional token HMM and a feature HMM that converted each line to a sequence of features, whose firings yielded symbols that were to be emitted by the HMM.
- The paper gave a very thorough and convincing motivation for the method it proposed.
- Sufficient details of the new model were given.
- I also liked the variations that the authors gave to address the various problems, such as data sparseness...