Klein 2002 conditional structure versus conditional estimation in nlp models write up

Contributions of the paper
- This paper focuses on a comparing how different model assumptions affect the evaluation measures. Draws a clear line between conditional parameter estimation and markovian model and tries to compare the two.
- The paper also draws conclusions about using different loss functions while considering a standard classifier such as Naive-Bayes. Different loss functions sometimes lead to standard statistical models such as logistic regression.
- The authors claim that the maximizing the conditional likelihood in general leads to better performance, and this holds true even for smaller datasets ( except may be for some of the extremely small ones ). The give a fantastic explanation of why this is true especially in NLP where smoothing plays an important role.
- They compare HMM trained for different loss functions and MEMMs, and bring out some interesting observations such as observation bias and how unobserving a word changes how the model fits itself

I liked the way in which the experiments were performed, questioning the validity of the assumption of each model and how the assumptions relate to the task at hand. The explanation of phenomena such as observation bias sounds convincing to me.

One thing I'm not really convinced with is that they do did not talk about the ease with which additional features could be added into the model such as a MEMM, which cannot be done in HMM without making the joint likelihood more complicated.

Navigation menu