Difference between revisions of "Nschneid writeup of McCallum 2000"

From Cohen Courses
Jump to navigationJump to search
m (1 revision)
 
(No difference)

Latest revision as of 11:42, 3 September 2010

This is Nschneid's review of Frietag_2000_Maximum_Entropy_Markov_Models_for_Information_Extraction_and_Segmentation

The MEMM paper. MEMMs are essentially linear-chain CRFs, but each state is conditioned on its observation and the previous state (whereas a CRF state is conditioned on all observations). Does a nice job of arguing for the advantages of conditioning on observations and for allowing overlapping features. Clear presentation of modified forward-backward, training with GIS, and some variants (Baum-Welch EM for semi-supervised case, and a reinforcement learning variant). Does not address the label bias problem.

Experiments on a FAQ segmentation-classification task with a small corpus show the MEMM fares better than several HMM variants. I would have liked to see more experiments, however, such as with POS tagging and NER.