Difference between revisions of "Class meeting for 10-605 LDA"
From Cohen Courses
Jump to navigationJump to search (→Slides) |
|||
Line 3: | Line 3: | ||
=== Slides === | === Slides === | ||
− | * | + | * [http://www.cs.cmu.edu/~wcohen/10-605/lda-1.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/lda-1.pdf PDF]. |
+ | |||
+ | === Quiz === | ||
+ | |||
+ | * No quiz today | ||
=== Readings === | === Readings === |
Revision as of 09:50, 22 November 2016
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.
Contents
Slides
Quiz
- No quiz today
Readings
- Distributed Algorithms for Topic Models, Newman et al, JMLR 2009.
- Efficient Methods for Topic Model Inference on Streaming Document Collections, Yao, Mimno, McCallum KDD 2009.
- Reducing the sampling complexity of topic models, Li, Ahmed, Ravi, & Smola, KDD 2014
- LightLDA: Big Topic Models on Modest Compute Clusters, Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric P. Xing, Tie-Yan Liu, Wei-Ying Ma, 2015
Things to remember
- How Gibbs sampling is used to sample from a model.
- The "generative story" associated with key models like LDA, naive Bayes, and stochastic block models.
- What a "mixed membership" generative model is.
- The time complexity and storage requirements of Gibbs sampling for LDAs.
- How LDA learning can be sped up using IPM approaches.
- Why efficient sampling is important for LDAs
- How sampling can be sped up for many topics by preprocessing the parameters of the distribution
- How the storage used for LDA can be reduced by exploiting the fact that many words are rare.