Difference between revisions of "Gimpel and Smith, NAACL 2010"
Line 27: | Line 27: | ||
==Experimental Results== | ==Experimental Results== | ||
− | The authors perform a small experiment on the [[UsesDataset::CoNLL'03|CoNLL 2003 shared task]] in which they take care to give each model the same features. | + | The authors perform a small experiment on the [[UsesDataset::CoNLL'03|CoNLL 2003 shared task]] in which they take care to give each model the same features. Evaluated F1 scores. |
===Results=== | ===Results=== | ||
Line 36: | Line 36: | ||
==Related Work== | ==Related Work== | ||
− | The longer work: [[RelatedPaper::Gimpel and Smith, CMU 2010]]. K. Gimpel and N. A. Smith. 2010. Softmax-margin training for structured log-linear models. Technical report, Carnegie Mellon University | + | * The longer work: [[RelatedPaper::Gimpel and Smith, CMU 2010]]. '''K. Gimpel and N. A. Smith. 2010. Softmax-margin training for structured log-linear models. Technical report, Carnegie Mellon University''' |
+ | * Other people have tried to incorporate costs into a model. Previous literature | ||
+ | * * [[RelatedPaper::Kakade, et al, ICML 2002]], [[RelatedPaper::Och, ACL 2003]], [[RelatedPaper::Jansche, HLT-EMNLP 2005]], [[RelatedPaper::Suzuki, et all, COLING-ACL 2006]] |
Revision as of 21:53, 25 September 2011
Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions
This paper can be found at: [1]
Contents
Citation
Kevin Gimpel and Noah A. Smith. Softmax-margin CRFs: Training log-linear models with loss functions. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association for Computational Linguistics, pages 733-736, Los Angeles, California, USA, June 2010.
Summary
The authors want to be able to incorporate a cost function (present in structured SVMs) into standard conditional log-likelihood models. They introduce the softmax-margin objective function that achieves the best of both worlds. Using a NER task, it performs significantly better than a standard conditional loglikelihood model, a max-margin model, and the perceptron, but is indistinguishable from MIRA, risk, and JRB (Jensen risk bound; defined in the paper).
Brief Description of the Softmax-Margin objective function
Consider the objective functions for these four methods. The author's goal is to incorporate parts of conditional log likelihood and max-margin. As we can see, softmax has terms from each of these two methods. The paper lays out three rationals:
- bigger mistakes should be penalized more, like in max-margin methods
- take the conditional log likelihood function and just add a cost score.
- "replace the 'hard' maximum of max-margin with the 'softmax' () from [conditional log likelihood]; hence we use the name 'softmax-margin'".
One of the reasons softmax is so cool is that its convex, so we can optimize it easily. In the paper, they prove that softmax is greater than or equal to the conditional log likelihood as well as max-margin.
Conditional log likelihood:
Max-margin:
Risk:
Softmax-margin:
Experimental Results
The authors perform a small experiment on the CoNLL 2003 shared task in which they take care to give each model the same features. Evaluated F1 scores.
Results
Softmax is statistically better than conditional log likelihood, max-margin, and perceptron models; but statistically indistinguishable from MIRA, risk, and JRB.
Related Work
- The longer work: Gimpel and Smith, CMU 2010. K. Gimpel and N. A. Smith. 2010. Softmax-margin training for structured log-linear models. Technical report, Carnegie Mellon University
- Other people have tried to incorporate costs into a model. Previous literature
- * Kakade, et al, ICML 2002, Och, ACL 2003, Jansche, HLT-EMNLP 2005, Suzuki, et all, COLING-ACL 2006