Online Inference of Topics with Latent Dirichlet Allocation

From Cohen Courses
Revision as of 23:20, 31 March 2011 by Yandongl (talk | contribs) (Created page with 'This a [[Category::Paper]] discussed in Social Media Analysis 10-802 in Spring 2011. == Citation == Online Inference of Topics with Latent Dirichlet Allocation. Canini, Shi an…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This a Paper discussed in Social Media Analysis 10-802 in Spring 2011.

Citation

Online Inference of Topics with Latent Dirichlet Allocation. Canini, Shi and Griffiths. AISTATS 2009

Online version

download here

Summary

Traditional LDA inference methods only work in 'batch' mode, meaning they have to run over an entire document collection after they have been observed. Some collections, however, grow over time due to their nature, such as streaming news text. This paper introduces online inference model for LDA.

Methodology

As we all know in LDA model each document is represented as a mixture of topics with each topic z taking a different weight. I'll skip the introduction of LDA here since I've covered it in my last writeup.

The idea of batch Gibbs sampler, which was first introduced by Griffiths and Steyvers, is that it repeatedly sample the topic of each observed word according to its conditional distribution:


while are the hyperparameters and are the size of vocabulary and topics respectively