Mnduong writeup of Klein & Manning

From Cohen Courses
Jump to navigationJump to search

This is a review of Klein_2002_conditional_structure_versus_conditional_estimation_in_nlp_models by user:mnduong.

  • This paper breaks down the effect of changing the model structure and that of changing the parameter estimation criteria. The motivation of the paper is to find the reason for high performance reported in both generative methods and discriminative methods.
  • The paper finds that using conditional parameter estimation, which has the same criterion as that used in evaluation, helps increase both training and testing accuracy. To derive this result, the author uses the same Naive Bayes model for word sense disambiguation, but trained its parameters using different criteria.
  • The other finding of the paper is that using a conditional structure resulted in poorer performance. This was demonstrated in the POS tagging task, using an HMM and an MEMM. However, the authors do not believe that the label bias is the main cause for this, as stated by Lafferty et al. (2001). Rather, they claim that this label bias is merely a consequence of the conditional independence assumption, which results in later observations not being able to influence the current state.
  • I like the thorough experiments that the authors carried out to prove their points and to separate the effects of parameter estimation and model structure. Previous works (such as Lafferty et al. (2001), Sha & Pereira (2003)) mentioned the advantages and disadvantages of HMM and MEMM in terms of parameter estimation and model structure, but didn't provide empirical evidence to back up their arguments. This work convinces me further that CRF would be a good choice, because it combines conditional parameter estimation and a model structure that allows all observations to influence the current state.