Gimpel and Smith, NAACL 2010

From Cohen Courses
Revision as of 22:38, 25 September 2011 by Amr1 (talk | contribs) (→‎Results)
Jump to navigationJump to search

Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions

This paper can be found at: [1]


Kevin Gimpel and Noah A. Smith. Softmax-margin CRFs: Training log-linear models with loss functions. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association for Computational Linguistics, pages 733-736, Los Angeles, California, USA, June 2010.


The authors want to be able to incorporate a cost function (present in structured SVMs) into standard conditional log-likelihood models. They introduce the softmax-margin objective function that achieves the best of both worlds. Using a NER task, it performs significantly better than a standard conditional loglikelihood model, a max-margin model, and the perceptron, but is indistinguishable from MIRA, risk, and JRB (Jensen risk bound; defined in the paper).

Brief Description of the Softmax-Margin objective function

Consider the objective functions for these four methods. The author's goal is to incorporate parts of conditional log likelihood and max-margin. As we can see, softmax has terms from each of these two methods. The paper lays out three rationals:

  • bigger mistakes should be penalized more, like in max-margin methods
  • take the conditional log likelihood function and just add a cost score.
  • "replace the 'hard' maximum of max-margin with the 'softmax' () from [conditional log likelihood]; hence we use the name 'softmax-margin'".

One of the reasons softmax is so cool is that its convex, so we can optimize it easily. In the paper, they prove that softmax is greater than or equal to the conditional log likelihood as well as max-margin.

Conditional log likelihood:




Experimental Results

The authors perform a small experiment on the CoNLL 2003 shared task in which they take care to give each model the same features.


Gimpel-table.png Softmax is statistically better than conditional log likelihood, max-margin, and perceptron models; but statistically indistinguishable from MIRA, risk, and JRB.

Related Work