Liuy writeup of Klein 2002
This is a review of Klein 2002 conditional structure versus conditional estimation in nlp models by user:Liuy.
Among the two common explaining-away effects in NLP: label bias and observation bias of independence assumption of the model, the latter is more severe especially for simple POS tagging, given the complexity and sparsity of NLP problems.
The paper mainly try to argue and show some evidence that the errors occurring in part-of-speech tagging sometime is because of the improperness to force the independence assumptions of the conditional model structure to linguistic sequences. They in particular study HMMs and CMMs for POS to show how these assumptions can hurt accuracy.
The authors try to test apart the separate effect of the two factors : the structure of model and parameter estimation. Thus this paper separates conditional parameter estimation from conditional model structures; and show it is necessary to incorporate better features, when using conditional model structure for POS.
I think highly of the insights the authors shows on whether label bias is truly a significant effect for NLP problems. They explain the nature of this problem by a few generalizations: the ability to include better features leads to better performance ; assumptions implicit in the model have a large impact on errors; and maximizing objective has a small effect.
I do not like the way they handle the optimization problem. The author mentioned "optimization was done using conjugate gradient as- cent". It is also not clear whether gradient decent will give the global optimal for the objectives in their consideration. They should elaborate on this. How good the solution can be ? and is it possible for the algorithm to be stuck in the local optimal ?