Online Inference of Topics with Latent Dirichlet Allocation

This a Paper discussed in Social Media Analysis 10-802 in Spring 2011.

Citation

Online Inference of Topics with Latent Dirichlet Allocation. Canini, Shi and Griffiths. AISTATS 2009

Online version

Summary

Traditional LDA inference methods only work in 'batch' mode, meaning they have to run over an entire document collection after they have been observed. Some collections, however, grow over time due to their nature, such as streaming news text. This paper introduces online inference model for LDA.

Methodology

As we all know in LDA model each document is represented as a mixture of topics with each topic z taking a different weight. I'll skip the introduction of LDA here since I've covered it in my last writeup.

The idea of batch Gibbs sampler, which was first introduced by Griffiths and Steyvers, is that it repeatedly sample the topic of each observed word according to its conditional distribution:

$P(z_{j}|z_{N\setminus j},w_{N})\propto {\frac {n_{z_{j},N\setminus j}^{w_{j}}+\beta }{n_{z_{j},N\setminus j}^{(.)}+W\beta }}{\frac {n_{z_{j},N\setminus j}^{d_{j}}+\alpha }{n_{.,N\setminus j}^{(d_{j})}+T\alpha }}$

while $\alpha ,\beta$ are the hyperparameters and $W,T$ are the size of vocabulary and topics respectively

Online Inference of Topics with Latent Dirichlet Allocation

Contents

Citation

Online version

Summary

Methodology

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools