Sgardine proof of convexity of CRF likelihood
Since it came up several times in class I decided to prove the concavity of the CRF likelihood function to my own satisfaction. I believe this could also be proven more generally for exponential family distributions, but I've directly proved the special case here.
Contents
- 1 Lemma 1: linear functions are convex (and concave)
- 2 Lemma 2: Log-sum-exp is convex
- 3 Lemma 3: sums of concave functions are concave
- 4 Lemma 4: If h:Rk → R is convex and nondecreasing in each argument, and g:Rn → Rk is convex then f(x) = h(g(x)) is convex
- 5 Lemma 5: Log-sum-exp is nondecreasing in each argument
- 6 Proposition: The log-likelihood of CRF is concave
- 7 Reference
Lemma 1: linear functions are convex (and concave)
Direct consequence of the definition of convexity, BV (3.1)
Lemma 2: Log-sum-exp is convex
The function f(x1,...,xn) = log Σi=1..n exp(xi) is convex by looking at its Hessian, BV (3.1.5, pg 73)
Lemma 3: sums of concave functions are concave
BV (3.2.1)
Lemma 4: If h:Rk → R is convex and nondecreasing in each argument, and g:Rn → Rk is convex then f(x) = h(g(x)) is convex
BV (3.2.4, pg. 86)
Lemma 5: Log-sum-exp is nondecreasing in each argument
The partial with respect to xi is [Σj exp xj]-1 exp xi which is always positive.
Proposition: The log-likelihood of CRF is concave
The log-likelihood (as stated, e.g. in Sha Pereira 2003) is
L(λ) = Σk [ Σi λif(y,x,i) - log Σy exp Σiλif(y,x,i) ]
Now each λif(y,x,i) is linear and hence concave and convex (Lemma 1).
The first inner term is then concave by Lemma 3.
The subtracted part (aka log Zλ) is convex by Lemmas 4 and 5, so its additive inverse is concave.
Their sum is then concave by Lemma 3, and the entire quantity is concave again by Lemma 3.
Reference
1. Boyd, S. and Vandenberghe, L., Convex Optimization, Cambridge University Press, March 2004. pdf