Difference between revisions of "Class meeting for 10-405 LDA"

From Cohen Courses
Jump to: navigation, search
(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...")
 
(Readings)
 
Line 16: Line 16:
  
 
* Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.
 
* Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.
 +
  
 
Speedups for LDA:
 
Speedups for LDA:
 +
 +
* [http://www.cs.cmu.edu/~wcohen/10-605/notes/lda.pdf William's notes on fast sampling for LDA]
 +
 +
=== Optional Readings ===
  
 
* [http://jmlr.csail.mit.edu/papers/volume10/newman09a/newman09a.pdf Distributed Algorithms for Topic Models], Newman et al, JMLR 2009.
 
* [http://jmlr.csail.mit.edu/papers/volume10/newman09a/newman09a.pdf Distributed Algorithms for Topic Models], Newman et al, JMLR 2009.

Latest revision as of 09:23, 16 April 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Contents

Slides

Quiz

Readings

Basic LDA:

  • Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.


Speedups for LDA:

Optional Readings

Things to remember

  • How Gibbs sampling is used to sample from a model.
  • The "generative story" associated with key models like LDA, naive Bayes, and stochastic block models.
  • What a "mixed membership" generative model is.
  • The time complexity and storage requirements of Gibbs sampling for LDAs.
  • How LDA learning can be sped up using IPM approaches.
  • Why efficient sampling is important for LDAs
  • How sampling can be sped up for many topics by preprocessing the parameters of the distribution
  • How the storage used for LDA can be reduced by exploiting the fact that many words are rare.