Difference between revisions of "Expectation Maximization"

From Cohen Courses
Jump to navigationJump to search
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
From Wikipedia:
+
Expectation Maximization (EM) is an iterative [[Category::method]] for finding the maximum likelihood estimates of the unknown parameters, <math>\phi</math>, of a probabilistic model with latent variables, <math>Z</math>, given observed data, <math>X</math>.
 +
The estimates are found by iteratively alternating between two steps: (1) the Expectation (E)-Step, in which the posterior distribution of the latent variables is calculated, using a conditional expectation given the data, and the current estimate of the model parameters, and (2) the Maximization (M)-Step, in which the model parameters are re-estimated using the distribution over <math>Z</math>.
  
In statistics, an expectation-maximization (EM) algorithm is a method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. EM is an iterative method which alternates between performing an expectation (E) step, which computes the expectation of the log-likelihood evaluated using the current estimate for the latent variables, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step.
+
== Literature ==
 +
[[RelatedPaper::Dempster et al., 1997]] formalize the EM algorithm and provide a proof of convergence. The conversion properties are discussed in [[RelatedPaper::McLachlan and Krishnan, 1997]] in detail.  
  
[http://en.wikipedia.org/wiki/Expectation-maximization_algorithm External link]
+
An alternative explanation to EM gives the intuition behind alternating between variables in terms of lower bound maximization ([[RelatedPaper::Neal and Hinton, 1998]], and [[RelatedPaper::Minka, 1998]]). In this formulation, the E-step is regarded as determining a lower bound for the posterior distribution, and the M-step as optimizing this bound, resulting in a better estimate for the model parameters.
 +
 
 +
EM has been shown to be a special case of [[AlternatingMinimization | Alternating Minimization]].
 +
 
 +
== Algorithm ==
 +
 
 +
Let <math>X</math> be a set of observations resulting for a probabilistic model with unknown parameters, <math>\phi</math>, and hidden variables <math>Z</math>. We wish to recover a set of parameters, <math>\phi</math> that will maximize the likelihood function:
 +
 
 +
<math>L(\phi) = p(X|\phi) = \sum_{Z} p(X, Z|\phi)</math>
 +
 
 +
We assume that directly optimizing <math>p(X|\phi)</math> is difficult and that considering the full data <math>p(X, Z|\phi)</math> is easier.
 +
 
 +
The algorithm will maximize the probability of the parameters <math>\phi</math> given the data <math>X</math>, marginalizing over the latent variables <math>Z</math>. This is done in two steps:
 +
# E-Step: Determine the conditional expectation
 +
#: <math> \operatorname{E}_{Z|X, \phi_{(t)}}\left[ \ln p(X,Z | \phi) \right] </math>
 +
# M-Step: Maximize this with respect to <math>\phi</math>
 +
#: <math> \phi_{(t+1)} = \underset{\phi} \operatorname{arg\,max} \{ \ \operatorname{E}_{Z|X, \phi_{(t)}}\left[ \ln p(X,Z | \phi) \right] \ \} </math>
 +
 
 +
== Alternative Formulation ==
 +
 
 +
EM can be seen as a process where the two steps increase the same function over alternating variables. Following the formulation of [[RelatedPaper::Neal and Hinton, 1998]], the function being maximized is
 +
:<math> F(\tilde{P},\phi) = \operatorname{E}_\tilde{P} [ p(X, Z|\phi) ] + H(\tilde{P}) </math>
 +
where <math>\tilde{P}</math> is the distribution over the latent variables, <math>Z</math>, and <math>H(\tilde{P})</math> is the entropy of the distribution <math>\tilde{P}</math> given by
 +
:<math>H(\tilde{P}) = -\operatorname{E}_\tilde{P} [ \log \tilde{P}(Z) ] </math>
 +
Note that <math>F</math> is defined for a fixed value of the observed data, <math>X</math>.
 +
 
 +
The function <math>F</math> can also be related to the [http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence Kullback-Lieber (KL) divergence] between <math>\tilde{P}(Z)</math> and <math>p_\phi(Z)=p(Z|X,\phi)</math> as
 +
:<math> F(\tilde{P},\phi) = -D_{\text{KL}}(\tilde{P}||p_\phi) + \log L(\phi) </math>
 +
 
 +
Under this formulation, the two steps of the EM algorithm are defined as:
 +
# E-Step: Find the distribution <math>\tilde{P}</math> that maximizes <math>F</math>
 +
#: <math> \tilde{P}_{(t)} = \underset{\tilde{P}} \operatorname{arg\,max} \{ \ F(\tilde{P}, \phi_{(t-1)}) \ \} </math>
 +
# M-Step: Find the values of <math>\phi</math> that maximize <math>F</math>
 +
#: <math> \phi_{(t)} =\underset{\phi} \operatorname{arg\,max} \{ \ F(\tilde{P}_{(t)}, \phi) \ \} </math>
  
 
== Relevant Papers ==
 
== Relevant Papers ==
 +
{{#ask: [[UsesMethod:: Expectation Maximization]] 
 +
| mainlabel = Paper that uses EM
 +
| ?AddressesProblem = Addresses Problem
 +
| ?UsesDataset = Uses Dataset
 +
}}
  
{{#ask: [[UsesMethod::Expectation Maximization]]
+
=== Comments ===
| ?AddressesProblem
+
 
| ?UsesDataset
+
 
}}
+
If you're curious to learn even more about the alternating minimization view of EM, check out the first several chapters of Wainwright and Jordan, http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf  --[[User:Brendan|Brendan]] 14:43, 11 October 2011 (UTC)

Latest revision as of 10:43, 11 October 2011

Expectation Maximization (EM) is an iterative method for finding the maximum likelihood estimates of the unknown parameters, , of a probabilistic model with latent variables, , given observed data, . The estimates are found by iteratively alternating between two steps: (1) the Expectation (E)-Step, in which the posterior distribution of the latent variables is calculated, using a conditional expectation given the data, and the current estimate of the model parameters, and (2) the Maximization (M)-Step, in which the model parameters are re-estimated using the distribution over .

Literature

Dempster et al., 1997 formalize the EM algorithm and provide a proof of convergence. The conversion properties are discussed in McLachlan and Krishnan, 1997 in detail.

An alternative explanation to EM gives the intuition behind alternating between variables in terms of lower bound maximization (Neal and Hinton, 1998, and Minka, 1998). In this formulation, the E-step is regarded as determining a lower bound for the posterior distribution, and the M-step as optimizing this bound, resulting in a better estimate for the model parameters.

EM has been shown to be a special case of Alternating Minimization.

Algorithm

Let be a set of observations resulting for a probabilistic model with unknown parameters, , and hidden variables . We wish to recover a set of parameters, that will maximize the likelihood function:

We assume that directly optimizing is difficult and that considering the full data is easier.

The algorithm will maximize the probability of the parameters given the data , marginalizing over the latent variables . This is done in two steps:

  1. E-Step: Determine the conditional expectation
  2. M-Step: Maximize this with respect to

Alternative Formulation

EM can be seen as a process where the two steps increase the same function over alternating variables. Following the formulation of Neal and Hinton, 1998, the function being maximized is

where is the distribution over the latent variables, , and is the entropy of the distribution given by

Note that is defined for a fixed value of the observed data, .

The function can also be related to the Kullback-Lieber (KL) divergence between and as

Under this formulation, the two steps of the EM algorithm are defined as:

  1. E-Step: Find the distribution that maximizes
  2. M-Step: Find the values of that maximize

Relevant Papers

Comments

If you're curious to learn even more about the alternating minimization view of EM, check out the first several chapters of Wainwright and Jordan, http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf --Brendan 14:43, 11 October 2011 (UTC)