Difference between revisions of "Teh et, JASA2006"
(Created page with '== Citation == Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006 == Online version == [http://ww…') |
|||
Line 19: | Line 19: | ||
A hierarchical Dirichlet process is a distribution over a set of random probability measures over <math>(\theta; B)</math>. The process defines a set of random probability measures <math>G_j</math>, one for each group, and a global random probability measure <math>G_0</math>. The global measure <math>G_0</math> is distributed as a Dirichlet process with concentration parameter and base probability measure H: | A hierarchical Dirichlet process is a distribution over a set of random probability measures over <math>(\theta; B)</math>. The process defines a set of random probability measures <math>G_j</math>, one for each group, and a global random probability measure <math>G_0</math>. The global measure <math>G_0</math> is distributed as a Dirichlet process with concentration parameter and base probability measure H: | ||
− | <math>G_0 | \gamma, H | + | <math>G_0 | \gamma, H \sim DP(\gamma, H)</math> |
− | and the random measures | + | and the random measures <math>G_j</math> are conditionally independent given G0, with distributions given by a Dirichlet process with base probability measure <math>G_0</math>: |
− | <math>G_j | \alpha_0, G_0 | + | <math>G_j | \alpha_0, G_0 \sim DP(\alpha_0, G_0)</math>. |
A hierarchical Dirichlet process can be used as the prior distribution over the factors for grouped data. For each j let <math>\theta_{j1}, \theta_{j2},...</math> be i.i.d. random variables distributed as <math>G_j</math> . Each <math>\theta_ji</math> is a factor corresponding to a single observation <math>x_{ji}</math>. The likelihood is given by: | A hierarchical Dirichlet process can be used as the prior distribution over the factors for grouped data. For each j let <math>\theta_{j1}, \theta_{j2},...</math> be i.i.d. random variables distributed as <math>G_j</math> . Each <math>\theta_ji</math> is a factor corresponding to a single observation <math>x_{ji}</math>. The likelihood is given by: | ||
− | <math>\theta_{ji} | G_j | + | <math>\theta_{ji} | G_j \sim G_j</math> |
− | <math>x_{ji} | \theta_{ji} | + | <math>x_{ji} | \theta_{ji} \sim F(\theta_{ji})</math>. |
The hierarchical Dirichlet process can readily be extended to more than two levels. That is, the base measure H can itself be a draw from a DP, and the hierarchy can be extended for as many levels as are deemed useful. | The hierarchical Dirichlet process can readily be extended to more than two levels. That is, the base measure H can itself be a draw from a DP, and the hierarchy can be extended for as many levels as are deemed useful. | ||
+ | * The stick-breaking construction | ||
+ | Given that the global measure <math>G_0</math> is distributed as a Dirichlet process, it can be expressed using a stick-breaking representation: | ||
+ | |||
+ | <math>G_0 = \sum_{k=1}^{infty} \beta_k \delta_{\phi_k},</math> | ||
+ | |||
+ | where <math>\phi_k \sim H</math> independently and <math>\beta = (\beta_k)_{k=1}^{\infty} \sim GEM(\gamma)</math> are mutually independent. Since <math>G_0</math> has support at the points <math>\phi = (\phi_k)_{k=1}^{\infty}</math>, each <math>G_j</math> necessarily has support at these points as well, and can thus be written as: | ||
+ | |||
+ | <math>G_j = \sum_{k=1}^{\infty} \pi_{jk} \delta_{\phi_k}</math> | ||
+ | |||
+ | Let <math>\pi_j = ((\pi_{jk})_{k=1}^{\infty})</math>. Note that the weights <math>\pi_j</math> are independent given <math>\beta</math> (since the <math>G_j</math> are independent given <math>G_0</math>). These weights <math>\pi_j</math> are related to the global weights <math>\beta</math>. | ||
+ | |||
+ | An equivalent representation of the hierarchical Dirichlet process mixture can be: | ||
+ | |||
+ | <math>\beta | \gamma \sim GEM(\gamma)</math> | ||
+ | |||
+ | <math>\pi_j | \alpha_0, \beta \sim DP(\alpha_0, \beta)</math> <math>z_ji | \pi_j \sim \pi_j</math> | ||
+ | |||
+ | <math>\phi_k | H \sim H</math> <math>x_{ji} | z_{ji}, (\phi_k)_{k=1}^{infty} \sim F(\phi_{z_{ji}})</math>. | ||
+ | |||
+ | After some derivations, the relation between weights and <math>\beta</math> is: | ||
+ | |||
+ | <math>\frac{1}{1-\sum_{l=1}^{k-1} \pi_{jl}} (\pi_{jk}, \sum_{l=k+1}^{infty} \pi_{jl}) \sim Dir (\alpha_0 \beta_k, \alpha_0 \sum_{l=k+1}^{infty} \beta_l)</math>. | ||
Revision as of 14:23, 31 March 2011
Citation
Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006
Online version
Summary
This paper proposed a nonparametric Bayes approach to decide the number of mixture components in grouped data, the basic idea is:
- Develop analogs for the Hierarchical Dirichlet process with representations of both a stick-breaking and a "Chinese restaurant franchise”.
Methodology
A hierarchical Dirichlet process is a distribution over a set of random probability measures over . The process defines a set of random probability measures , one for each group, and a global random probability measure . The global measure is distributed as a Dirichlet process with concentration parameter and base probability measure H:
and the random measures are conditionally independent given G0, with distributions given by a Dirichlet process with base probability measure :
.
A hierarchical Dirichlet process can be used as the prior distribution over the factors for grouped data. For each j let be i.i.d. random variables distributed as . Each is a factor corresponding to a single observation . The likelihood is given by:
.
The hierarchical Dirichlet process can readily be extended to more than two levels. That is, the base measure H can itself be a draw from a DP, and the hierarchy can be extended for as many levels as are deemed useful.
- The stick-breaking construction
Given that the global measure is distributed as a Dirichlet process, it can be expressed using a stick-breaking representation:
where independently and are mutually independent. Since has support at the points , each necessarily has support at these points as well, and can thus be written as:
Let . Note that the weights are independent given (since the are independent given ). These weights are related to the global weights .
An equivalent representation of the hierarchical Dirichlet process mixture can be:
.
After some derivations, the relation between weights and is:
.