Difference between revisions of "Posterior Regularization for Expectation Maximization"
Line 27: | Line 27: | ||
<math> | <math> | ||
− | Q_x = \{ q(z|x):\ | + | Q_x = \{ q(z|x):\exists_\xi \ E_q[f(x,z)] - b_x \le \xi; \left \| \xi \right \|_2^2 < \epsilon^2 \} |
</math> | </math> | ||
Revision as of 22:57, 29 September 2011
Summary
The Expectation Maximization algorithm is a method for finding the maximum likelihood estimates for the parameters in a statistical model. During the E-step of this algorithm, posterior probabilities are calculated for the latent data by fixing the parameters.
In many fields, prior knowledge about the posterior probabilities are known and can be applied to the model to improve the statistical model, yet the method to include such in the most efficient way in EM is not clear.
Posterior Regularization is a method used to impose such contraints on posteriors in the Expectation Maximization algorithm, allowing a finer-level control over these posteriors. The key advantage of this method is the fact that the base model remains unchanged, but during learning it is driven to obey the constrains that are set.
Method Description
For a given set of observed data, a set of latent data and a set of parameters , the Expectation Maximization algorithm can be viewed as the alternation between two maximization steps of the function , by marginalizing different free variables.
The E-step is defined as:
where is the Kullback-Leibler divergence given by , q(z|x) is an arbitrary probability distribution over the latent variable z and is the posterior probability for z, for the fixed parameters .
The new is then used in the M-step, which is defined as:
The goal of this method is to define a way to constrains over posteriors, so that prior information can be set over these posteriors by defining a constraint set of allowed distribution over the latent variables . This method addresses this by setting the restriction on . Thus, is defined as:
, redefining the E-step as:
Applications
Posterior regularization has been used to improve Word Alignments in Graca et al, 2010, by defining bijectivity and symmetry constraints over alignment posteriors.