Difference between revisions of "Expectation Regularization"

From Cohen Courses
Jump to navigationJump to search
Line 17: Line 17:
 
l(\theta; D, U)=\sum_{n}\text{log}p_{\theta}(y^{(n)}|x^{(n)}) - \lambda \triangle(\tilde{p}, \hat{p})
 
l(\theta; D, U)=\sum_{n}\text{log}p_{\theta}(y^{(n)}|x^{(n)}) - \lambda \triangle(\tilde{p}, \hat{p})
 
</math>
 
</math>
 +
 +
Note that this is a global regularizer instead of a local one, in which case it would assign all instances to the majority of
 +
the class.

Revision as of 20:19, 30 November 2010

This is a method introduced in G.S Mann and A. McCallum, ICML 2007. It is often served as a regularized term with the likelihood function. In practice human often have an insight of label prior distribution. This method introduced a way to take advantage of this prior knowledge.

Let's denote human-provided prior as . We minimizes the distance between and . KL-distance is used here so the regularization becomes

For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example, the new conditional likelihood of data becomes

Note that this is a global regularizer instead of a local one, in which case it would assign all instances to the majority of the class.