# Hierarchical Dirichlet process

A hierarchical Dirichlet process is a distribution over a set of random probability measures over ${\displaystyle (\theta ;B)}$. The process defines a set of random probability measures ${\displaystyle G_{j}}$, one for each group, and a global random probability measure ${\displaystyle G_{0}}$. The global measure ${\displaystyle G_{0}}$ is distributed as a Dirichlet process with concentration parameter and base probability measure H:

${\displaystyle G_{0}|\gamma ,H\sim DP(\gamma ,H)}$

and the random measures ${\displaystyle G_{j}}$ are conditionally independent given G0, with distributions given by a Dirichlet process with base probability measure ${\displaystyle G_{0}}$:

${\displaystyle G_{j}|\alpha _{0},G_{0}\sim DP(\alpha _{0},G_{0})}$.

A hierarchical Dirichlet process can be used as the prior distribution over the factors for grouped data. For each j let ${\displaystyle \theta _{j1},\theta _{j2},...}$ be i.i.d. random variables distributed as ${\displaystyle G_{j}}$ . Each ${\displaystyle \theta _{j}i}$ is a factor corresponding to a single observation ${\displaystyle x_{ji}}$. The likelihood is given by:

${\displaystyle \theta _{ji}|G_{j}\sim G_{j}}$

${\displaystyle x_{ji}|\theta _{ji}\sim F(\theta _{ji})}$.

The hierarchical Dirichlet process can readily be extended to more than two levels. That is, the base measure H can itself be a draw from a DP, and the hierarchy can be extended for as many levels as are deemed useful.

• The stick-breaking construction

Given that the global measure ${\displaystyle G_{0}}$ is distributed as a Dirichlet process, it can be expressed using a stick-breaking representation:

${\displaystyle G_{0}=\sum _{k=1}^{\infty }\beta _{k}\delta _{\phi _{k}},}$

where ${\displaystyle \phi _{k}\sim H}$ independently and ${\displaystyle \beta =(\beta _{k})_{k=1}^{\infty }\sim GEM(\gamma )}$ are mutually independent. Since ${\displaystyle G_{0}}$ has support at the points ${\displaystyle \phi =(\phi _{k})_{k=1}^{\infty }}$, each ${\displaystyle G_{j}}$ necessarily has support at these points as well, and can thus be written as:

${\displaystyle G_{j}=\sum _{k=1}^{\infty }\pi _{jk}\delta _{\phi _{k}}}$

Let ${\displaystyle \pi _{j}=((\pi _{jk})_{k=1}^{\infty })}$. Note that the weights ${\displaystyle \pi _{j}}$ are independent given ${\displaystyle \beta }$ (since the ${\displaystyle G_{j}}$ are independent given ${\displaystyle G_{0}}$). These weights ${\displaystyle \pi _{j}}$ are related to the global weights ${\displaystyle \beta }$.

An equivalent representation of the hierarchical Dirichlet process mixture can be:

${\displaystyle \beta |\gamma \sim GEM(\gamma )}$

${\displaystyle \pi _{j}|\alpha _{0},\beta \sim DP(\alpha _{0},\beta )}$

${\displaystyle z_{j}i|\pi _{j}\sim \pi _{j}}$

${\displaystyle \phi _{k}|H\sim H}$

${\displaystyle x_{ji}|z_{ji},(\phi _{k})_{k=1}^{\infty }\sim F(\phi _{z_{ji}})}$.

After some derivations, the relation between weights and ${\displaystyle \beta }$ is:

${\displaystyle {\frac {1}{1-\sum _{l=1}^{k-1}\pi _{jl}}}(\pi _{jk},\sum _{l=k+1}^{\infty }\pi _{jl})\sim Dir(\alpha _{0}\beta _{k},\alpha _{0}\sum _{l=k+1}^{\infty }\beta _{l})}$.

• Chinese restaurant franchise

The restaurants correspond to groups and the customers correspond to the factors ${\displaystyle \theta _{ji}}$. Let ${\displaystyle \phi _{1},...,\phi _{K}}$ denote K i.i.d. random variables distributed according to H; this is the global menu of dishes. Vairables ${\displaystyle \psi _{jt}}$ represent the table-specific choice of dishes; particular, ${\displaystyle \psi _{jt}}$ is the dish served at table t and restaurant j. Use notation ${\displaystyle n_{jtk}}$ to denote the number of maintain counts of customers and counts of tables. Then,

${\displaystyle \theta _{ji}|\theta _{j1},...,\theta _{j,i-1},\alpha _{0},G_{0}\sim \sum _{t=1}^{m_{j.}}{\frac {n_{jt.}}{i-1+\alpha _{0}}}\delta _{\psi _{jt}}+{\frac {\alpha _{0}}{i-1+\alpha _{0}}}G_{0}}$,

${\displaystyle \psi _{jt}|\psi _{11},\psi _{12},...,\psi _{21},...,\psi _{j,t-1},\gamma ,H\sim \sum _{k=1}^{K}{\frac {m_{.k}}{m_{..}+\gamma }}\delta _{\phi _{k}}+{\frac {\gamma }{m_{..}+\gamma }}H}$