Difference between revisions of "Entropy Minimization for Semi-supervised Learning"

Revision as of 20:48, 8 October 2010

Minimum entropy regularization can be applied to any model of posterior distribution.

The learning set is denoted ${\mathcal {L}}_{n}=\{X^{(i)},Z^{(i)}\}_{i=1}^{n}$ , where $Z^{(i)}\in \{0,1\}^{K}$ : If $X^{(i)}$ is labeled as $w_{i}$ , then $Z_{k}^{(i)}=1$ and $Z_{l}^{(i)}=0$ for $l\not =k$ ; if $X^{(i)}$ is unlabeled, then $Z_{l}^{(i)}=1$ for $l=1\dots K$ .

The conditional entropy of class labels conditioned on the observed variables:

$H(Y|X,Z;L_{n})=-{\frac {1}{n}}\sum _{i=1}^{n}\sum _{k=1}^{K}P(Y^{(i)}=w_{k}|X^{(i)},Z^{(i)}){\text{log}}P(Y^{(i)}=w_{k}|X^{(i)},Z^{(i)})$

Assuming that labels are missing at random, we have that

$P(Y^{(i)}=w_{k}|X^{(i)},Z^{(i)})={\frac {Z^{(i)_{k}}P(Y^{(i)}=w_{k}|X^{(i)})}{\sum _{k=1}^{K}Z_{l}^{(i)}P(Y^{(i)}=w_{k}|X^{(i)})}}$

The posterior distribution is defined as

${\begin{alignedat}{2}C({\boldsymbol {\theta }},\lambda ;L_{n})&=L({\boldsymbol {\theta }};{\mathcal {L}}_{n})-\lambda H(Y|X,Z;{\mathcal {L}}_{n})\\&=\sum _{i=1}^{n}{\text{log}}(\sum _{k=1}^{K}Z_{ik}P(Y^{i}=w_{k}|X^{i}))+\lambda \sum _{i=1}^{n}\sum _{k=1}^{K}P(Y^{i}=w_{k}|X^{i},Z^{i}){\text{log}}P(Y^{i}=w_{k}|X^{i},Z^{i})\end{alignedat}}$

@@ Line 11: / Line 11: @@
 <math>
 H(Y|X,Z; L_{n}) = -\frac{1}{n} \sum^{n}_{i=1} \sum^{K}_{k=1} P(Y^{(i)}=w_{k}|X^{(i)}, Z^{(i)})\text{log}P(Y^{(i)}=w_{k}|X^{(i)},Z^{(i)})
+</math>
+Assuming that labels are missing at random, we have that
+<math>
+P(Y^{(i)}=w_{k}|X^{(i)}, Z^{(i)}) = \frac{Z^{(i)_{k}}P(Y^{(i)}=w_{k}|X^{(i)})}{\sum^{K}_{k=1} Z^{(i)}_{l} P(Y^{(i)}=w_{k}|X^{(i)})}
 </math>

Difference between revisions of "Entropy Minimization for Semi-supervised Learning"

Revision as of 20:48, 8 October 2010

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools