Difference between revisions of "Expectation Regularization"

Latest revision as of 20:28, 30 November 2010

This is a method introduced in G.S Mann and A. McCallum, ICML 2007. It is often served as a regularized term with the likelihood function. In practice human often have an insight of label prior distribution. This method introduced a way to take advantage of this prior knowledge.

Let's denote human-provided prior as ${\tilde {p}}$ and empirical label distribution as ${\hat {p}}$ . The empirical label distribution is computed over unlabeled data set $U$ ,

${\hat {p}}_{\theta }(y)={\frac {\sum _{x\in U}p_{\theta }(y|x)}{|U|}}$

We want to minimize the distance between ${\tilde {p}}$ and ${\hat {p}}$ , denoted as $\triangle ({\hat {p}},{\tilde {p}})$ . KL-distance is used here so the regularization becomes

$D({\tilde {p}}||{\hat {p}})=\sum _{y}{\tilde {p}}(y){\text{log}}{\frac {{\tilde {p}}(y)}{{\hat {p}}(y)}}=H({\tilde {p}},{\hat {p}})-H({\tilde {p}})$

For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example, the new conditional likelihood of data becomes

$l(\theta ;D,U)=\sum _{n}{\text{log}}p_{\theta }(y^{(n)}|x^{(n)})-\lambda \triangle ({\tilde {p}},{\hat {p}})$

where $D$ is the labeled data set.

Note that this is a global regularizer instead of a local one, in which case it would assign all instances to the majority of the class.

@@ Line 3: / Line 3: @@
 This method introduced a way to take advantage of this prior knowledge.
-Let's denote human-provided prior as <math> \tilde{p} </math>.
+Let's denote human-provided prior as <math> \tilde{p} </math> and empirical label distribution as <math> \hat{p} </math>.
-We minimizes the distance between <math> \tilde{p} </math> and <math> \hat{p} </math>.
+The empirical label distribution is computed over unlabeled data set <math>U</math>,
+<math>
+\hat{p}_{\theta}(y)=\frac{\sum_{x \in U} p_{\theta}(y|x)}{|U|}
+</math>
+We want to minimize the distance between <math> \tilde{p} </math> and <math> \hat{p} </math>, denoted as <math>\triangle(\hat{p},\tilde{p})</math>.
 KL-distance is used here so the regularization becomes
@@ Line 10: / Line 16: @@
 D(\tilde{p}||\hat{p})=\sum_{y} \tilde{p}(y) \text{log} \frac{\tilde{p}(y)}{\hat{p}(y)}=H(\tilde{p},\hat{p})-H(\tilde{p})
 </math>
 For semi-supervised learning purposes, we can augment the objective function by adding regularization term. For example,
 the new conditional likelihood of data becomes
@@ Line 15: / Line 22: @@
 <math>
 l(\theta; D, U)=\sum_{n}\text{log}p_{\theta}(y^{(n)}|x^{(n)}) - \lambda \triangle(\tilde{p}, \hat{p})
-<\math>
+</math>
+where <math>D</math> is the labeled data set.
+Note that this is a global regularizer instead of a local one, in which case it would assign all instances to the majority of
+the class.

Difference between revisions of "Expectation Regularization"

Latest revision as of 20:28, 30 November 2010

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools