Difference between revisions of "Expectation Regularization"

Revision as of 16:53, 30 November 2010

This is a method introduced in G.S Mann and A. McCallum, ICML 2007. It is often served as a regularized term with the likelihood function. In practice human often have an insight of label prior distribution. This method introduced a way to take advantage of this prior knowledge.

Let's denote human-provided prior as ${\tilde {p}}$ . We minimizes the distance between ${\tilde {p}}$ and ${\hat {p}}$ . KL-distance is used here so the regularization becomes

$D({\tilde {p}}||{\hat {p}})=\sum _{y}{\tilde {p}}(y){\text{log}}{\frac {{\tilde {p}}(y)}{{\hat {p}}(y)}}=H({\tilde {p}},{\hat {p}})-H({\tilde {p}})$ For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example, the new conditional likelihood of data becomes

<math> l(\theta; D, U)= - \lambda (\tilde{p}, \hat{p}) <\math>

@@ Line 14: / Line 14: @@
 <math>
-l(\theta; D, U)=\sum_{n}\text{log}p_{\theta}(y^{(n)}|x^{(n)})
+l(\theta; D, U)= - \lambda (\tilde{p}, \hat{p})
 <\math>

Difference between revisions of "Expectation Regularization"

Revision as of 16:53, 30 November 2010

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools