Difference between revisions of "Expectation Regularization"
From Cohen Courses
Jump to navigationJump to searchPastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
Line 10: | Line 10: | ||
D(\tilde{p}||\hat{p})=\sum_{y} \tilde{p}(y) \text{log} \frac{\tilde{p}(y)}{\hat{p}(y)}=H(\tilde{p},\hat{p})-H(\tilde{p}) | D(\tilde{p}||\hat{p})=\sum_{y} \tilde{p}(y) \text{log} \frac{\tilde{p}(y)}{\hat{p}(y)}=H(\tilde{p},\hat{p})-H(\tilde{p}) | ||
</math> | </math> | ||
+ | |||
For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example, | For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example, | ||
the new conditional likelihood of data becomes | the new conditional likelihood of data becomes |
Revision as of 16:54, 30 November 2010
This is a method introduced in G.S Mann and A. McCallum, ICML 2007. It is often served as a regularized term with the likelihood function. In practice human often have an insight of label prior distribution. This method introduced a way to take advantage of this prior knowledge.
Let's denote human-provided prior as . We minimizes the distance between and . KL-distance is used here so the regularization becomes
For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example, the new conditional likelihood of data becomes