KeisukeKamataki writeup of Klein and Manning 2002
This is a review of klein_2002_conditional_structure_versus_conditional_estimation_in_nlp_models by user:KeisukeKamataki.
- Summary: This paper tries to show the overview of probabilistic approaches for statistical NLP tasks. Specifically, they tried to focus on studying conditional parameter estimation and conditional model structure respectively in order to get deep understanding for each method.
For parameter estimation, they took Naive Bayes as a subject and compared the effectiveness of joint likelihood function and conditional likelihood function for word sense disambiguation. Generally speaking, using conditional likelihood outperformed joint likelihood, but there are a lot of exceptional cases especially when the performance is strongly affected by amount of training data. Maximizing likelihood usually helps improve accuracy in WSD.
For model structure, they considered HMM and CMM as POS tagging prediction accuracy. For fixed models, conditional likelihood worked better for HMM and same for MEMM. For fixed objective function, HMM worked consistently better. But the result may depend on how we treat unobserved words in CMM.
Through the error analysis of CMM, they found observation-effect that current observation explains hidden state too much and previous states are effectively ignored. In opposite HMM may tend to have a labeling bias, which is an opposite idea from observation bias. These phonomena come from the conditional independence assumption. The effect can be both good or bad.
- I like: Error analysis=> They did a deep error analysis and got a conclusion independence assumptions of the conditional model structure are sometimes unsuited to linguistic sequence. I think such error analysis would be important to get better understanding for NLP modeling techniques.