# Difference between revisions of "Class meeting for 10-405 LDA"

From Cohen Courses

(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...") |
(→Readings) |
||

Line 16: | Line 16: | ||

* Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022. | * Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022. | ||

+ | |||

Speedups for LDA: | Speedups for LDA: | ||

+ | |||

+ | * [http://www.cs.cmu.edu/~wcohen/10-605/notes/lda.pdf William's notes on fast sampling for LDA] | ||

+ | |||

+ | === Optional Readings === | ||

* [http://jmlr.csail.mit.edu/papers/volume10/newman09a/newman09a.pdf Distributed Algorithms for Topic Models], Newman et al, JMLR 2009. | * [http://jmlr.csail.mit.edu/papers/volume10/newman09a/newman09a.pdf Distributed Algorithms for Topic Models], Newman et al, JMLR 2009. |

## Latest revision as of 09:23, 16 April 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

## Contents |

### Slides

- Lecture 1: Powerpoint, PDF.
- Lecture 2: Powerpoint, PDF.

### Quiz

- No quiz for lecture 1
- Quiz for lecture 2

### Readings

Basic LDA:

- Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.

Speedups for LDA:

### Optional Readings

- Distributed Algorithms for Topic Models, Newman et al, JMLR 2009.
- Efficient Methods for Topic Model Inference on Streaming Document Collections, Yao, Mimno, McCallum KDD 2009.
- Reducing the sampling complexity of topic models, Li, Ahmed, Ravi, & Smola, KDD 2014
- A Scalable Asynchronous Distributed Algorithm for Topic Modeling, Yu, Hsieh, Yun, Vishwanathan, Dillon, WWW 2015

### Things to remember

- How Gibbs sampling is used to sample from a model.
- The "generative story" associated with key models like LDA, naive Bayes, and stochastic block models.
- What a "mixed membership" generative model is.
- The time complexity and storage requirements of Gibbs sampling for LDAs.
- How LDA learning can be sped up using IPM approaches.

- Why efficient sampling is important for LDAs
- How sampling can be sped up for many topics by preprocessing the parameters of the distribution
- How the storage used for LDA can be reduced by exploiting the fact that many words are rare.