Wka writeup of Frietag 2000
From Cohen Courses
Jump to navigationJump to searchThis is a review of Frietag 2000 Maximum Entropy Markov Models for Information Extraction and Segmentation by user:wka
The paper introduces a Markov sequence model similar to HMMs that allows representing observations as overlapping features, using maximum entropy to fit exponential models of state probability conditioned on a given observation and previous state. It addresses some problems of the traditional approach of representing observation probabilities as multinomial distributions over a discrete finite vocabulary; these problems are:
- the multinomial model can not benefit from richer representations of observations, specifically through overlapping non-independent features
- the multinomial model requires enumerating all possible observations, which limits its application when this is not possible for a given task
- HMM parameters maximize likelihood of observation sequences, while the task is to predict the state sequence given the observation sequence
MEMMs move away from generative joint probability model of HMMs, and are trained through GIS instead of Baum-Welch.
Results and conclusions:
- increase in precision and recall, by training on one document only and testing on (n-1).
- representing problem-related features is far more effective than token-level representation
- structural regularities are critical
Comments
- Since experimental results on the FAQ dataset showed an increase in both precision and recall, it would have been a good addition to the paper if they tested their new model on a different kind of dataset and provided results for that too.