Difference between revisions of "Blei et al Latent Dirichlet Allocation"

Revision as of 19:59, 3 October 2012

Citation

author = {Blei, David M. and Ng, Andrew Y. and Jordan, Michael I.},
title = {Latent dirichlet allocation},
journal = {J. Mach. Learn. Res.},
issue_date = {3/1/2003},
volume = {3},
month = mar,
year = {2003},
issn = {1532-4435},
pages = {993--1022},
numpages = {30},
url = {http://dl.acm.org/citation.cfm?id=944919.944937},
acmid = {944937},
publisher = {JMLR.org}

Online Version

Latent Dirichlet Allocation

Summary

This paper addresses the problem of document modeling

LDA

LDA is a generative probabilistic model for discrete data such as text corpora. It is a Bayesian model that consists of three hierarchies. Each item of the collection is modeled as a ﬁnite mixture i.e. modeled as being generated by an underlying (latent) set of topics, where each topic is characterized by a distribution over words. Each document $d$ in the corpus is assumed to be generated using the following process:

1. The author chooses the number of words  $N_{d}$  in the document by drawing from a Poisson( $\xi$ ) distribution. 
2. He then tosses a Dirichlet hypergenerator Dirichlet( $\alpha$ ) to get a  $\theta _{d,n}$  which is used to generate a Multinomial( $\theta _{d,n}$ ) topiv generator
3. For each word  $w_{d,n}$  from the  $N_{d}$  words
  a. A topic  $z_{d,n}$  is chosen from a Multinomial( $\theta _{d,n}$ ) distribution
  b. A topic specific word generator parametrized by  $z_{d,n}$  and  $\beta$  is then tossed to get the word

The parameters $\alpha$ and $\beta$ are corpus level parameters and are sampled only once in the process of generating a corpus. The variables $\theta _{d,n}$ are sampled once per document. Finally, the variables $z_{d,n}$ and $w_{d,n}$ are word-level variables and are sampled once for each word in each document. What makes LDA unique is that it consists of three levels, and notably the topic node is sampled repeatedly within a document. This allows documents to be associated with multiple topics rather than just one.

Inference

The posterior distribution of the hidden variables given a document is, in general, intractable. However, many efficient approximate inference techniques can be used to estimate the posterior. The paper describes a convexity-based variational method involving EM algorithm for Bayes parameter estimation.

The basic idea is to obtain a lower bound on the log likelihood parametrized by the variational parameters using the Jensen’s inequality. The variational parameters are chosen by an optimization procedure that attempts to ﬁnd the tightest possible lower bound. The authors show that this requires choosing the parameters to minimize the KL divergence between the distribution under the variational parameters and the true posterior. This leads to a pair of interdependent update equations which can be solved via an iterative fixed-point method.

Parameter Estimation

We now need to estimate the parameters $\alpha$ and $\beta$ of the LDA model. An empirical Bayes method for parameter estimation is provided. Given a corpus of documents D, we wish to ﬁnd parameters $\alpha$ and $\beta$ that maximize the (marginal) log likelihood of the data: $l(\alpha ,\beta )=\sum _{d\in D}log(p(documentd|\alpha ,\beta )$

This leads to the following iterative EM algorithm

 1. E step: For each document, find the optimizing values of the variational parameters
 2. M step: Maximize resulting lower bound on the log likelihood with respect to the model parameters  $\alpha ,\beta$

Experiments

LDA is empirically evaluated in several problem domains -- document modeling, document classiﬁcation, and collaborative ﬁltering.

@@ Line 29: / Line 29: @@
     b. A topic specific word generator parametrized by <math>z_{d,n}</math> and <math>\beta</math> is then tossed to get the word
-The parameters <math>\alpha</math> and <math>\beta</math> are corpus level parameters and are sampled only once in the process of generating a corpus. The variables <math>\theta_{d,n}</math> are sampled once per document. Finally, the variables <math>z_{d,n}</math> and <math>w_{d,n}</math> are word-level variables and are sampled once for each word in each document.
+The parameters <math>\alpha</math> and <math>\beta</math> are corpus level parameters and are sampled only once in the process of generating a corpus. The variables <math>\theta_{d,n}</math> are sampled once per document. Finally, the variables <math>z_{d,n}</math> and <math>w_{d,n}</math> are word-level variables and are sampled once for each word in each document. What makes LDA unique is that it consists of three levels, and notably the topic node is sampled repeatedly within a document. This allows documents to be associated with multiple topics rather than just one.
-What makes LDA unique is that it consists of three levels, and notably the topic node is sampled repeatedly within a document. This allows documents to be associated with multiple topics rather than just one.
 === Inference ===
 The posterior distribution of the hidden variables given a document is, in general, intractable. However, many efficient approximate inference techniques can be used to estimate the posterior. The paper describes a convexity-based variational method involving EM algorithm for Bayes parameter estimation.
-The basic idea is to make use of Jensen’s inequality to obtain an adjustable lower bound on the log likelihood. A family of lower bounds, indexed by a set of variational parameters, is considered and the variational parameters are chosen by an optimization procedure that attempts to ﬁnd the tightest possible lower bound. It leads to the following iterative EM algorithm
+The basic idea is to obtain a lower bound on the log likelihood parametrized by the variational parameters using the Jensen’s inequality. The variational parameters are chosen by an optimization procedure that attempts to ﬁnd the tightest possible lower bound. The authors show that this requires choosing the parameters to minimize the KL divergence between the distribution under the variational parameters and the true posterior. This leads to a pair of interdependent update equations which can be solved via an iterative fixed-point method.
+== Parameter Estimation ==
+We now need to estimate the parameters <math>\alpha</math> and <math>\beta</math> of the LDA model. An empirical Bayes method for parameter estimation is provided. Given a corpus of documents D, we wish to ﬁnd parameters <math>\alpha</math> and <math>\beta</math> that maximize the (marginal) log likelihood of the data:
+<math>l(\alpha, \beta) = \sum_{d \in D} log(p(document d | \alpha, \beta)</math>
+This leads to the following iterative EM algorithm
 . E step: For each document, find the optimizing values of the variational parameters
 . M step: Maximize resulting lower bound on the log likelihood with respect to the model parameters <math>\alpha, \beta</math>
@@ Line 44: / Line 48: @@
 == Study Plan ==
-. [http://en.wikipedia.org/wiki/Mixture_model Mixture models]
+# [http://en.wikipedia.org/wiki/Mixture_model Mixture models]
+# [http://www.cs.brown.edu/~th/papers/Hofmann-SIGIR99.pdf Probabilistic Latent Semantic Indexing]
-. [http://www.cs.brown.edu/~th/papers/Hofmann-SIGIR99.pdf Probabilistic Latent Semantic Indexing]
+# [http://en.wikipedia.org/wiki/Variational_Bayesian_methods Variational Bayesian Methods]
+# KL divergence
-. [http://en.wikipedia.org/wiki/Variational_Bayesian_methods Variational Bayesian Methods]
+# [http://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf Variational Inference lecture pdf by Blei]
-. [http://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf Variational Inference lecture pdf by Blei]

Difference between revisions of "Blei et al Latent Dirichlet Allocation"

Revision as of 19:59, 3 October 2012

Contents

Citation

Online Version

Summary

LDA

Inference

Parameter Estimation

Experiments

Study Plan

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools