Difference between revisions of "Expectation Regularization"

From Cohen Courses
Jump to navigationJump to search
Line 10: Line 10:
 
D(\tilde{p}||\hat{p})=\sum_{y} \tilde{p}(y) \text{log} \frac{\tilde{p}(y)}{\hat{p}(y)}=H(\tilde{p},\hat{p})-H(\tilde{p})
 
D(\tilde{p}||\hat{p})=\sum_{y} \tilde{p}(y) \text{log} \frac{\tilde{p}(y)}{\hat{p}(y)}=H(\tilde{p},\hat{p})-H(\tilde{p})
 
</math>
 
</math>
 +
 
For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example,
 
For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example,
 
the new conditional likelihood of data becomes
 
the new conditional likelihood of data becomes

Revision as of 17:54, 30 November 2010

This is a method introduced in G.S Mann and A. McCallum, ICML 2007. It is often served as a regularized term with the likelihood function. In practice human often have an insight of label prior distribution. This method introduced a way to take advantage of this prior knowledge.

Let's denote human-provided prior as . We minimizes the distance between and . KL-distance is used here so the regularization becomes

For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example, the new conditional likelihood of data becomes