Difference between revisions of "Entropy Minimization for Semi-supervised Learning"

From Cohen Courses
Jump to navigationJump to search
Line 10: Line 10:
  
 
<math>
 
<math>
H(Y|X,Z; L_{n}) = -\frac{1}{n} \sum^{n}_{i=1} \sum^{K}_{k=1} P(Y^{(i)}=w_{k}|X^{(i)}, Z^{(i)})\text{log}P(Y^{(i)}=w_{k}|X^{(i)},Z^{(i)})
+
H(Y|X,Z; \mathcal{L}_{n}) = -\frac{1}{n} \sum^{n}_{i=1} \sum^{K}_{k=1} P(Y^{(i)}=w_{k}|X^{(i)}, Z^{(i)})\text{log}P(Y^{(i)}=w_{k}|X^{(i)},Z^{(i)})
 
</math>
 
</math>
  
Line 16: Line 16:
  
 
<math>
 
<math>
P(Y^{(i)}=w_{k}|X^{(i)}, Z^{(i)}) = \frac{Z^{(i)_{k}}P(Y^{(i)}=w_{k}|X^{(i)})}{\sum^{K}_{k=1} Z^{(i)}_{l} P(Y^{(i)}=w_{k}|X^{(i)})}
+
P(Y^{(i)}=w_{k}|X^{(i)}, Z^{(i)}) = \frac{Z^{(i)}_{k}P(Y^{(i)}=w_{k}|X^{(i)})}{\sum^{K}_{k=1} Z^{(i)}_{l} P(Y^{(i)}=w_{k}|X^{(i)})}
 
</math>
 
</math>
  
Line 23: Line 23:
 
<math>
 
<math>
 
\begin{alignat}{2}
 
\begin{alignat}{2}
C(\boldsymbol{\theta}, \lambda; L_{n}) & = L(\boldsymbol{\theta}; \mathcal{L}_{n}) - \lambda H(Y|X,Z; \mathcal{L}_{n}) \\
+
C(\boldsymbol{\theta}, \lambda; \mathcal{L}_{n}) & = L(\boldsymbol{\theta}; \mathcal{L}_{n}) - \lambda H(Y|X,Z; \mathcal{L}_{n}) \\
 
& = \sum^{n}_{i=1} \text{log}(\sum^{K}_{k=1} Z_{ik}P(Y^{i}=w_{k}|X^{i})) + \lambda \sum^{n}_{i=1} \sum_{k=1}^{K} P(Y^{i}=w_{k}|X^{i}, Z^{i}) \text{log} P(Y^{i}=w_{k}|X^{i}, Z^{i})
 
& = \sum^{n}_{i=1} \text{log}(\sum^{K}_{k=1} Z_{ik}P(Y^{i}=w_{k}|X^{i})) + \lambda \sum^{n}_{i=1} \sum_{k=1}^{K} P(Y^{i}=w_{k}|X^{i}, Z^{i}) \text{log} P(Y^{i}=w_{k}|X^{i}, Z^{i})
 
\end{alignat}
 
\end{alignat}
  
 
</math>
 
</math>

Revision as of 20:49, 8 October 2010

Minimum entropy regularization can be applied to any model of posterior distribution.

The learning set is denoted , where : If is labeled as , then and for ; if is unlabeled, then for .

The conditional entropy of class labels conditioned on the observed variables:

Assuming that labels are missing at random, we have that

The posterior distribution is defined as