Sgardine proof of convexity of CRF likelihood

From Cohen Courses
Jump to navigationJump to search

Since it came up several times in class I decided to prove the concavity of the CRF likelihood function to my own satisfaction. I believe this could also be proven more generally for exponential family distributions, but I've directly proved the special case here.

Lemma 1: linear functions are convex (and concave)

Direct consequence of the definition of convexity, BV (3.1)

Lemma 2: Log-sum-exp is convex

The function f(x1,...,xn) = log Σi=1..n exp(xi) is convex by looking at its Hessian, BV (3.1.5, pg 73)

Lemma 3: sums of concave functions are concave

BV (3.2.1)

Lemma 4: If h:Rk → R is convex and nondecreasing in each argument, and g:Rn → Rk is convex then f(x) = h(g(x)) is convex

BV (3.2.4, pg. 86)

Lemma 5: Log-sum-exp is nondecreasing in each argument

The partial with respect to xi is [Σj exp xj]-1 exp xi which is always positive.

Proposition: The log-likelihood of CRF is concave

The log-likelihood (as stated, e.g. in Sha Pereira 2003) is

L(λ) = Σk [ Σi λif(y,x,i) - log Σy exp Σiλif(y,x,i) ]

Now each λif(y,x,i) is linear and hence concave and convex (Lemma 1).

The first inner term is then concave by Lemma 3.

The subtracted part (aka log Zλ) is convex by Lemmas 4 and 5, so its additive inverse is concave.

Their sum is then concave by Lemma 3, and the entire quantity is concave again by Lemma 3.


Reference

1. Boyd, S. and Vandenberghe, L., Convex Optimization, Cambridge University Press, March 2004.  pdf