Difference between revisions of "Teh et, JASA2006"

Revision as of 15:23, 31 March 2011

Citation

Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006

Online version

Mathew Beal's papers

Summary

This paper proposed a nonparametric Bayes approach to decide the number of mixture components in grouped data, the basic idea is:

Develop analogs for the Hierarchical Dirichlet process with representations of both a stick-breaking and a "Chinese restaurant franchise”.

Use MCMC algorithm for posterior inference under hierarchical Dirichlet process mixtures.

Methodology

A hierarchical Dirichlet process is a distribution over a set of random probability measures over $(\theta ;B)$ . The process defines a set of random probability measures $G_{j}$ , one for each group, and a global random probability measure $G_{0}$ . The global measure $G_{0}$ is distributed as a Dirichlet process with concentration parameter and base probability measure H:

$G_{0}|\gamma ,H\sim DP(\gamma ,H)$

and the random measures $G_{j}$ are conditionally independent given G0, with distributions given by a Dirichlet process with base probability measure $G_{0}$ :

$G_{j}|\alpha _{0},G_{0}\sim DP(\alpha _{0},G_{0})$ .

A hierarchical Dirichlet process can be used as the prior distribution over the factors for grouped data. For each j let $\theta _{j1},\theta _{j2},...$ be i.i.d. random variables distributed as $G_{j}$ . Each $\theta _{j}i$ is a factor corresponding to a single observation $x_{ji}$ . The likelihood is given by:

$\theta _{ji}|G_{j}\sim G_{j}$

$x_{ji}|\theta _{ji}\sim F(\theta _{ji})$ .

The hierarchical Dirichlet process can readily be extended to more than two levels. That is, the base measure H can itself be a draw from a DP, and the hierarchy can be extended for as many levels as are deemed useful.

The stick-breaking construction

Given that the global measure $G_{0}$ is distributed as a Dirichlet process, it can be expressed using a stick-breaking representation:

$G_{0}=\sum _{k=1}^{infty}\beta _{k}\delta _{\phi _{k}},$

where $\phi _{k}\sim H$ independently and $\beta =(\beta _{k})_{k=1}^{\infty }\sim GEM(\gamma )$ are mutually independent. Since $G_{0}$ has support at the points $\phi =(\phi _{k})_{k=1}^{\infty }$ , each $G_{j}$ necessarily has support at these points as well, and can thus be written as:

$G_{j}=\sum _{k=1}^{\infty }\pi _{jk}\delta _{\phi _{k}}$

Let $\pi _{j}=((\pi _{jk})_{k=1}^{\infty })$ . Note that the weights $\pi _{j}$ are independent given $\beta$ (since the $G_{j}$ are independent given $G_{0}$ ). These weights $\pi _{j}$ are related to the global weights $\beta$ .

An equivalent representation of the hierarchical Dirichlet process mixture can be:

$\beta |\gamma \sim GEM(\gamma )$

$\pi _{j}|\alpha _{0},\beta \sim DP(\alpha _{0},\beta )$ $z_{j}i|\pi _{j}\sim \pi _{j}$

$\phi _{k}|H\sim H$ $x_{ji}|z_{ji},(\phi _{k})_{k=1}^{infty}\sim F(\phi _{z_{ji}})$ .

After some derivations, the relation between weights and $\beta$ is:

${\frac {1}{1-\sum _{l=1}^{k-1}\pi _{jl}}}(\pi _{jk},\sum _{l=k+1}^{infty}\pi _{jl})\sim Dir(\alpha _{0}\beta _{k},\alpha _{0}\sum _{l=k+1}^{infty}\beta _{l})$ .

@@ Line 19: / Line 19: @@
 A hierarchical Dirichlet process is a distribution over a set of random probability measures over <math>(\theta; B)</math>. The process defines a set of random probability measures <math>G_j</math>, one for each group, and a global random probability measure <math>G_0</math>. The global measure <math>G_0</math> is distributed as a Dirichlet process with concentration parameter and base probability measure H:
-<math>G_0 | \gamma, H ~ DP(\gamma, H)</math>
+<math>G_0 | \gamma, H \sim DP(\gamma, H)</math>
-and the random measures Gj are conditionally independent given G0, with distributions given by a Dirichlet process with base probability measure G0:
+and the random measures <math>G_j</math> are conditionally independent given G0, with distributions given by a Dirichlet process with base probability measure <math>G_0</math>:
-<math>G_j | \alpha_0, G_0 ~ DP(\alpha_0, G_0)</math>.
+<math>G_j | \alpha_0, G_0 \sim DP(\alpha_0, G_0)</math>.
 A hierarchical Dirichlet process can be used as the prior distribution over the factors for grouped data. For each j let <math>\theta_{j1}, \theta_{j2},...</math> be i.i.d. random variables distributed as <math>G_j</math> . Each <math>\theta_ji</math> is a factor corresponding to a single observation <math>x_{ji}</math>. The likelihood is given by:
-<math>\theta_{ji} | G_j ~ G_j</math>
+<math>\theta_{ji} | G_j \sim G_j</math>
-<math>x_{ji} | \theta_{ji} ~ F(\theta_{ji})</math>.
+<math>x_{ji} | \theta_{ji} \sim F(\theta_{ji})</math>.
 The hierarchical Dirichlet process can readily be extended to more than two levels. That is, the base measure H can itself be a draw from a DP, and the hierarchy can be extended for as many levels as are deemed useful.
+* The stick-breaking construction
+Given that the global measure <math>G_0</math> is distributed as a Dirichlet process, it can be expressed using a stick-breaking representation:
+<math>G_0 = \sum_{k=1}^{infty} \beta_k \delta_{\phi_k},</math>
+where <math>\phi_k \sim H</math> independently and <math>\beta = (\beta_k)_{k=1}^{\infty} \sim GEM(\gamma)</math> are mutually independent. Since <math>G_0</math> has support at the points <math>\phi = (\phi_k)_{k=1}^{\infty}</math>, each <math>G_j</math> necessarily has support at these points as well, and can thus be written as:
+<math>G_j = \sum_{k=1}^{\infty} \pi_{jk} \delta_{\phi_k}</math>
+Let <math>\pi_j = ((\pi_{jk})_{k=1}^{\infty})</math>. Note that the weights <math>\pi_j</math> are independent given <math>\beta</math> (since the <math>G_j</math> are independent given <math>G_0</math>). These weights <math>\pi_j</math> are related to the global weights <math>\beta</math>.
+An equivalent representation of the hierarchical Dirichlet process mixture can be:
+<math>\beta | \gamma \sim GEM(\gamma)</math>
+<math>\pi_j | \alpha_0, \beta \sim DP(\alpha_0, \beta)</math>                 <math>z_ji | \pi_j \sim \pi_j</math>
+<math>\phi_k | H \sim H</math>                 <math>x_{ji} | z_{ji}, (\phi_k)_{k=1}^{infty} \sim F(\phi_{z_{ji}})</math>.
+After some derivations, the relation between weights and <math>\beta</math> is:
+<math>\frac{1}{1-\sum_{l=1}^{k-1} \pi_{jl}} (\pi_{jk}, \sum_{l=k+1}^{infty} \pi_{jl}) \sim Dir (\alpha_0 \beta_k, \alpha_0 \sum_{l=k+1}^{infty} \beta_l)</math>.

Difference between revisions of "Teh et, JASA2006"

Revision as of 15:23, 31 March 2011

Contents

Citation

Online version

Summary

Methodology

Data

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools