Difference between revisions of "Expectation Regularization"

From Cohen Courses
Jump to navigationJump to search
Line 6: Line 6:
 
We minimizes the distance between <math> \tilde{p} </math> and <math> \hat{p} </math>.
 
We minimizes the distance between <math> \tilde{p} </math> and <math> \hat{p} </math>.
 
KL-distance is used here so the regularization becomes
 
KL-distance is used here so the regularization becomes
 +
 
<math>
 
<math>
D(\tilde{p}||\hat{p})=\sum_{y} \tilde{p}(y) \text{log} \frac{\tilde{p}(y)}{\hat{p}(y)}
+
D(\tilde{p}||\hat{p})=\sum_{y} \tilde{p}(y) \text{log} \frac{\tilde{p}(y)}{\hat{p}(y)}=H(\tilde{p},\hat{p})-H(\tilde{p})
 
</math>
 
</math>

Revision as of 17:47, 30 November 2010

This is a method introduced in G.S Mann and A. McCallum, ICML 2007. It is often served as a regularized term with the likelihood function. In practice human often have an insight of label prior distribution. This method introduced a way to take advantage of this prior knowledge.

Let's denote human-provided prior as . We minimizes the distance between and . KL-distance is used here so the regularization becomes