Difference between revisions of "Expectation Regularization"
PastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
| Line 3: | Line 3: | ||
This method introduced a way to take advantage of this prior knowledge. | This method introduced a way to take advantage of this prior knowledge. | ||
| − | Let's denote human-provided prior as <math> \tilde{p} </math>. | + | Let's denote human-provided prior as <math> \tilde{p} </math> and empirical label distribution as <math> \hat{p} </math>. |
| + | The empirical label distribution is computed over unlabeled data set <math>U</math>, | ||
| + | |||
| + | <math> | ||
| + | \hat{p}_{\theta}(y)=\frac{\sum_{x \in U} p_{\theta}(y|x)}{|U|} | ||
| + | </math> | ||
| + | |||
We minimizes the distance between <math> \tilde{p} </math> and <math> \hat{p} </math>. | We minimizes the distance between <math> \tilde{p} </math> and <math> \hat{p} </math>. | ||
KL-distance is used here so the regularization becomes | KL-distance is used here so the regularization becomes | ||
Revision as of 19:24, 30 November 2010
This is a method introduced in G.S Mann and A. McCallum, ICML 2007. It is often served as a regularized term with the likelihood function. In practice human often have an insight of label prior distribution. This method introduced a way to take advantage of this prior knowledge.
Let's denote human-provided prior as and empirical label distribution as . The empirical label distribution is computed over unlabeled data set ,
We minimizes the distance between and . KL-distance is used here so the regularization becomes
For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example, the new conditional likelihood of data becomes
Note that this is a global regularizer instead of a local one, in which case it would assign all instances to the majority of the class.