Gimpel and Smith, NAACL 2010
Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions
This paper can be found at: [1]
Contents
Citation
Kevin Gimpel and Noah A. Smith. Softmax-margin CRFs: Training log-linear models with loss functions. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association for Computational Linguistics, pages 733-736, Los Angeles, California, USA, June 2010.
Summary
The authors want to be able to incorporate a cost function (present in structured SVMs) into standard conditional log-likelihood models. They introduce the softmax-margin objective function that achieves the best of both worlds. Using a NER task, it performs significantly better than a standard conditional loglikelihood model, a max-margin model, and the perceptron, but is indistinguishable from MIRA, risk, and JRB (Jensen risk bound; defined in the paper).
Brief Description of the Softmax-Margin objective function
Consider the objective functions for these four methods. The author's goal is to incorporate parts of conditional log likelihood and max-margin. As we can see, softmax has terms from each of these two methods. The paper lays out three rationals:
- bigger mistakes should be penalized more, like in max-margin methods
- take the conditional log likelihood function and just add a cost score.
- "replace the 'hard' maximum of max-margin with the 'softmax' () from [conditional log likelihood]; hence we use the name 'softmax-margin'".
One of the reasons softmax is so cool is that its convex, so we can optimize it easily. In the paper, they prove that softmax is greater than or equal to the conditional log likelihood as well as max-margin.
Conditional log likelihood:
Max-margin:
Risk:
Softmax-margin:
Experimental Results
The authors perform a small experiment on the CoNLL 2003 shared task in which they take care to give each model the same features. Evaluated F1 scores.
JRB (Jensen risk bound) is defined as the function which is an upper bound on risk, but is much easier to compute than risk (risk is not necessarily convex)
Results
Softmax is statistically better than conditional log likelihood, max-margin, and perceptron models; but statistically indistinguishable from MIRA, risk, and JRB.
Related Work
- The longer work: Gimpel and Smith, CMU 2010. K. Gimpel and N. A. Smith. 2010. Softmax-margin training for structured log-linear models. Technical report, Carnegie Mellon University
- Other people have tried to incorporate costs into a model. Previous literature: