Difference between revisions of "Posterior Regularization for Expectation Maximization"

Latest revision as of 14:11, 30 September 2011

Original Paper

Summary

The Expectation Maximization algorithm is a method for finding the maximum likelihood estimates for the parameters in a statistical model. During the E-step of this algorithm, posterior probabilities are calculated for the latent data by fixing the parameters.

In many fields, prior knowledge about the posterior probabilities can be applied to the model in order to improve the statistical model, yet the method to include such in the most efficient way in EM is not clear.

Posterior Regularization is a method used to impose such constraints on posteriors in the Expectation Maximization algorithm, allowing a finer-level control over these posteriors.

Method Description

For a given set $x\in X$ of observed data, a set of latent data $z\in Z$ and a set of parameters $\theta$ , the Expectation Maximization algorithm can be viewed as the alternation between two maximization steps of the function $F(q,\theta )$ , by marginalizing different free variables.

The E-step is defined as:

$q^{t+1}=argmax_{q}F(q,\theta ^{t})=argmax_{q}[-D_{KL}(q||p_{\theta ^{t}}(z|x))]=p_{\theta ^{t}}(z|x)$

where $D_{KL}$ is the Kullback-Leibler divergence given by $D_{KL}(q||p)=E_{q}[log{\frac {q}{p}}]$ , q(z|x) is an arbitrary probability distribution over the latent variable z and $p_{\theta ^{t}}(z|x)$ is the posterior probability for z, for the fixed parameters $\theta$ .

The new $q$ is then used in the M-step, which is defined as:

$\theta ^{t+1}=argmax_{\theta }F(q^{t+1},\theta )=argmax_{\theta }\ E_{X}[\sum _{s}q^{t+1}(z|x)\ log\ p_{\theta }(x,z)]$

The goal of this method is to define a way to constrains over posteriors, so that prior information can be set over these posteriors by defining a constraint set $Q_{x}$ of allowed distribution over the latent variables $z$ .

This method addresses this by setting the restriction on $q(z|x)$ . Thus, $Q_{x}$ is defined as:

$Q_{x}=\{q(z|x):\exists _{\xi }\ E_{q}[f(x,z)]-b_{x}\leq \xi ;\left\|\xi \right\|_{2}^{2}<\epsilon ^{2}\}$

where $Q_{x}$ defines a set of valid distributions where some feature expectations $f(x,z)$ are bounded by $b_{x}$ and a free variable $\epsilon$ that denotes the amount of violation allowed.

To apply the constraints, the E-step is redefined as:

$q^{t+1}=argmax_{q\in Q(x)}F(q,\theta ^{t})=argmax_{q\in Q(x)}[-D_{KL}(q||p_{\theta ^{t}}(z|x))]$

The new distribution for $q$ is then used in the M-step, which remains unchanged.

We can see that the key advantage of this method is the fact that the base model remains unchanged, but during learning it is driven to obey the constrains by setting appropriate parameters.

Applications

Posterior regularization has been used to improve Word Alignments in Graca et al, 2010, by defining bijectivity and symmetry constraints over alignment posteriors.

Improvements using Posterior regularization have been report Dependency Grammar Induction for low resource languages in the work by Ganchev et al, 2009

This method has also been used in other applications such as Document Classification, Information Extraction and POS Induction. An tutorial, describing the method and its applications can be found here.

@@ Line 1: / Line 1: @@
+== Original Paper ==
+[http://www.seas.upenn.edu/~kuzman/publications/graca_nips_2007.pdf pdf]
 == Summary ==
 The [[Expectation Maximization]] algorithm is a method for finding the maximum likelihood estimates for the parameters in a statistical model. During the E-step of this algorithm, posterior probabilities are calculated for the latent data by fixing the parameters.
-In many fields, prior knowledge about the posterior probabilities are known and can be applied to the model to improve the statistical model, yet the method to include such in the most efficient way in EM is not clear.
+In many fields, prior knowledge about the posterior probabilities can be applied to the model in order to improve the statistical model, yet the method to include such in the most efficient way in EM is not clear.
-Posterior Regularization is a [[Category::method]] used to impose such contraints on posteriors in the [[AddressesProblem::Expectation Maximization]] algorithm, allowing a finer-level control over these posteriors. The key advantage of this method is the fact that the base model remains unchanged, but during learning it is driven to obey the constrains that are set.
+Posterior Regularization is a [[Category::method]] used to impose such constraints on posteriors in the [[AddressesProblem::Expectation Maximization]] algorithm, allowing a finer-level control over these posteriors.
 == Method Description ==
@@ Line 24: / Line 27: @@
 </math>
-The goal of this method is to define a way to constrains over posteriors, so that prior information can be set over these posteriors by defining a contraint set <math>Q(x)</math>. This method addresses this, by
+The goal of this method is to define a way to constrains over posteriors, so that prior information can be set over these posteriors by defining a constraint set <math>Q_x</math> of allowed distribution over the latent variables <math>z</math>.
+This method addresses this by setting the restriction on <math>q(z|x)</math>. Thus, <math>Q_x</math> is defined as:
+<math>
+Q_x = \{ q(z|x):\exists_\xi \ E_q[f(x,z)] - b_x \le \xi; \left \| \xi \right \|_2^2 < \epsilon^2 \}
+</math>
+where <math>Q_x</math> defines a set of valid distributions where some feature expectations <math>f(x,z)</math> are bounded by <math>b_x</math> and a free variable <math>\epsilon</math> that denotes the amount of violation allowed.
+To apply the constraints, the E-step is redefined as:
+<math>
+q^{t+1} = argmax_{q\in Q(x)} F(q,\theta^t) = argmax_{q\in Q(x)} [-D_{KL}(q||p_{\theta^t}(z|x))]
+</math>
+The new distribution for <math>q</math> is then used in the M-step, which remains unchanged.
+We can see that the key advantage of this method is the fact that the base model remains unchanged, but during learning it is driven to obey the constrains by setting appropriate parameters.
 == Applications ==
-Posterior regularization has been used to improve [[Word Alignments]] in [www.seas.upenn.edu/~taskar/pubs/cl10.pdfSimilar Graca et al, 2010], by defining bijectivity and symmetry constraints over alignment posteriors.
+Posterior regularization has been used to improve [[Word Alignments]] in [http://www.seas.upenn.edu/~taskar/pubs/cl10.pdfSimilar Graca et al, 2010], by defining bijectivity and symmetry constraints over alignment posteriors.
+Improvements using Posterior regularization have been report [[Dependency Grammar Induction]] for low resource languages in the work by [http://www.seas.upenn.edu/~kuzman/publications/ganchev_acl_2009.pdf Ganchev et al, 2009]
+This method has also been used in other applications such as [[Document Classification]], [[Information Extraction]] and [[POS Induction]]. An tutorial, describing the method and its applications can be found [http://sideinfo.wikkii.com/wiki/Main_Page here].

Difference between revisions of "Posterior Regularization for Expectation Maximization"

Latest revision as of 14:11, 30 September 2011

Contents

Original Paper

Summary

Method Description

Applications

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools