Citation
Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006
Online version
Mathew Beal's papers
Summary
This paper proposed a nonparametric Bayes approach to decide the number of mixture components in grouped data, the basic idea is:
Methodology
A hierarchical Dirichlet process is a distribution over a set of random probability measures over
. The process defines a set of random probability measures
, one for each group, and a global random probability measure
. The global measure
is distributed as a Dirichlet process with concentration parameter and base probability measure H:
and the random measures
are conditionally independent given G0, with distributions given by a Dirichlet process with base probability measure
:
.
A hierarchical Dirichlet process can be used as the prior distribution over the factors for grouped data. For each j let
be i.i.d. random variables distributed as
. Each
is a factor corresponding to a single observation
. The likelihood is given by:
.
The hierarchical Dirichlet process can readily be extended to more than two levels. That is, the base measure H can itself be a draw from a DP, and the hierarchy can be extended for as many levels as are deemed useful.
- The stick-breaking construction
Given that the global measure
is distributed as a Dirichlet process, it can be expressed using a stick-breaking representation:
where
independently and
are mutually independent. Since
has support at the points
, each
necessarily has support at these points as well, and can thus be written as:
Let
. Note that the weights
are independent given
(since the
are independent given
). These weights
are related to the global weights
.
An equivalent representation of the hierarchical Dirichlet process mixture can be:
.
After some derivations, the relation between weights and
is:
.
- Chinese restaurant franchise
The restaurants correspond to groups and the customers correspond to the factors
. Let
denote K i.i.d. random variables distributed according to H; this is the global menu of dishes. Vairables
represent the table-specific choice of dishes; particular,
is the dish served at table t and restaurant j. Use notation
to denote the number of maintain counts of customers and counts of tables. Then,
,
MCMC algorithm for posterior sampling in the Chinese restaurant franchise
The Chinese restaurant franchise can yield a Gibbs sampling scheme for posterior sampling given observations x. Rather than dealing with the
's and
's, we shall sample their index variables
and
instead.
- Sampling t. The prior probability that
takes on a particular previously used value t is proportional to
, whereas the probability that it takes on a new value (say
) is proportional to
. The likelihood due to
given
for some previously used t is
, then
If the sampled value of
is
, we obtain a sample of
by sampling from
If as a result of updating
some table t becomes unoccupied, i.e.,
, then the probability that this table will be reoccupied in the future will be zero, since this is always proportional to
. As a result, we may delete the corresponding
from the data structure. If as a result of deleting
some mixture component k becomes unallocated, we delete this mixture component as well.
- Sampling k. Since changing
actually changes the component membership of all data items in table t, the likelihood obtained by setting
is given by
, so that the conditional
probability of
is
Data
Nematode biology abstracts
NIPS 1988-1999
Alice’s Adventures in Wonderland by Lewis Carroll