Difference between revisions of "Class meeting for 10-605 LDA"
From Cohen Courses
Jump to navigationJump to search (→Slides) |
(→Slides) |
||
(26 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]]. |
=== Slides === | === Slides === | ||
+ | * Lecture 1: [http://www.cs.cmu.edu/~wcohen/10-605/lda-1.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/lda-1.pdf PDF]. | ||
+ | * Lecture 2: [http://www.cs.cmu.edu/~wcohen/10-605/lda-2.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/lda-2.pdf PDF]. | ||
− | * | + | === Quiz === |
− | * [ | + | |
+ | * No quiz for lecture 1 | ||
+ | * [https://qna.cs.cmu.edu/#/pages/view/105 Quiz for lecture 2] | ||
=== Readings === | === Readings === | ||
+ | Basic LDA: | ||
+ | |||
+ | * Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022. | ||
+ | |||
+ | Speedups for LDA: | ||
* [http://jmlr.csail.mit.edu/papers/volume10/newman09a/newman09a.pdf Distributed Algorithms for Topic Models], Newman et al, JMLR 2009. | * [http://jmlr.csail.mit.edu/papers/volume10/newman09a/newman09a.pdf Distributed Algorithms for Topic Models], Newman et al, JMLR 2009. | ||
− | |||
* [http://people.cs.umass.edu/~mimno/papers/fast-topic-model.pdf Efficient Methods for Topic Model Inference on Streaming Document Collections], Yao, Mimno, McCallum KDD 2009. | * [http://people.cs.umass.edu/~mimno/papers/fast-topic-model.pdf Efficient Methods for Topic Model Inference on Streaming Document Collections], Yao, Mimno, McCallum KDD 2009. | ||
+ | * [http://dl.acm.org/citation.cfm?id=2623756 Reducing the sampling complexity of topic models], Li, Ahmed, Ravi, & Smola, KDD 2014 | ||
+ | * [https://dl.acm.org/citation.cfm?id=2741682 A Scalable Asynchronous Distributed Algorithm for Topic Modeling], Yu, Hsieh, Yun, Vishwanathan, Dillon, WWW 2015 | ||
+ | |||
+ | === Things to remember === | ||
+ | |||
+ | * How Gibbs sampling is used to sample from a model. | ||
+ | * The "generative story" associated with key models like LDA, naive Bayes, and stochastic block models. | ||
+ | * What a "mixed membership" generative model is. | ||
+ | * The time complexity and storage requirements of Gibbs sampling for LDAs. | ||
+ | * How LDA learning can be sped up using IPM approaches. | ||
+ | |||
+ | * Why efficient sampling is important for LDAs | ||
+ | * How sampling can be sped up for many topics by preprocessing the parameters of the distribution | ||
+ | * How the storage used for LDA can be reduced by exploiting the fact that many words are rare. |
Latest revision as of 10:31, 20 November 2017
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.
Contents
Slides
- Lecture 1: Powerpoint, PDF.
- Lecture 2: Powerpoint, PDF.
Quiz
- No quiz for lecture 1
- Quiz for lecture 2
Readings
Basic LDA:
- Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.
Speedups for LDA:
- Distributed Algorithms for Topic Models, Newman et al, JMLR 2009.
- Efficient Methods for Topic Model Inference on Streaming Document Collections, Yao, Mimno, McCallum KDD 2009.
- Reducing the sampling complexity of topic models, Li, Ahmed, Ravi, & Smola, KDD 2014
- A Scalable Asynchronous Distributed Algorithm for Topic Modeling, Yu, Hsieh, Yun, Vishwanathan, Dillon, WWW 2015
Things to remember
- How Gibbs sampling is used to sample from a model.
- The "generative story" associated with key models like LDA, naive Bayes, and stochastic block models.
- What a "mixed membership" generative model is.
- The time complexity and storage requirements of Gibbs sampling for LDAs.
- How LDA learning can be sped up using IPM approaches.
- Why efficient sampling is important for LDAs
- How sampling can be sped up for many topics by preprocessing the parameters of the distribution
- How the storage used for LDA can be reduced by exploiting the fact that many words are rare.