Online Inference of Topics with Latent Dirichlet Allocation
This a Paper discussed in Social Media Analysis 10-802 in Spring 2011.
Contents
Citation
Online Inference of Topics with Latent Dirichlet Allocation. Canini, Shi and Griffiths. AISTATS 2009
Online version
Summary
Traditional LDA inference methods only work in 'batch' mode, meaning they have to run over an entire document collection after they have been observed. Some collections, however, grow over time due to their nature, such as streaming news text. This paper introduces online inference model for LDA.
Methodology
As we all know in LDA model each document is represented as a mixture of topics with each topic z taking a different weight. I'll skip the introduction of LDA here since I've covered it in my last writeup.
The idea of batch Gibbs sampler, which was first introduced by Griffiths and Steyvers, is that it repeatedly sample the topic of each observed word according to its conditional distribution:
while are the hyperparameters and are the size of vocabulary and topics respectively