Rbalasub writeup of Freitag, McCullum and Pereira
The authors propose a conditional model that uses maximum entropy. This permits long ranging and/or nonindependent features which is not possible with HMMs. By learning only the conditional distribution of states given the observation and previous states, modeling effort is not wasted on learning the generative probabilities. The flaw in the model is that it does not model the conditional distribution of the states of all symbols in the sequence at the same time. This flaw was later rectified in the CRF model. The authors also propose an optimization technique GIS that is a specialized form of EM for maxent models, for learning the distributions. Future work after the publication of this paper have come up with optimization techniques that are faster.
The paper is impressive since it paved the way for powerful conditional models and was at the head of the trend away for generative models when the full generative power wasn't needed.