Difference between revisions of "Class meeting for 10-605 LDA"

Latest revision as of 10:31, 20 November 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.

Distributed Algorithms for Topic Models, Newman et al, JMLR 2009.
Efficient Methods for Topic Model Inference on Streaming Document Collections, Yao, Mimno, McCallum KDD 2009.
Reducing the sampling complexity of topic models, Li, Ahmed, Ravi, & Smola, KDD 2014
A Scalable Asynchronous Distributed Algorithm for Topic Modeling, Yu, Hsieh, Yun, Vishwanathan, Dillon, WWW 2015

Things to remember

How Gibbs sampling is used to sample from a model.
The "generative story" associated with key models like LDA, naive Bayes, and stochastic block models.
What a "mixed membership" generative model is.
The time complexity and storage requirements of Gibbs sampling for LDAs.
How LDA learning can be sped up using IPM approaches.

Why efficient sampling is important for LDAs
How sampling can be sped up for many topics by preprocessing the parameters of the distribution
How the storage used for LDA can be reduced by exploiting the fact that many words are rare.

@@ Line 1: / Line 1: @@
-This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Spring_2014]].
+This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].
 === Slides ===
+* Lecture 1: [http://www.cs.cmu.edu/~wcohen/10-605/lda-1.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/lda-1.pdf PDF].
+* Lecture 2: [http://www.cs.cmu.edu/~wcohen/10-605/lda-2.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/lda-2.pdf PDF].
-* [http://www.cs.cmu.edu/~wcohen/10-605/topic-models-intro.ppt LDA and Block Models]
+=== Quiz ===
-* [http://www.cs.cmu.edu/~wcohen/10-605/fastlda.pptx Scaling up LDA]
+* No quiz for lecture 1
+* [https://qna.cs.cmu.edu/#/pages/view/105 Quiz for lecture 2]
 === Readings ===
+Basic LDA:
+* Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.
+Speedups for LDA:
 * [http://jmlr.csail.mit.edu/papers/volume10/newman09a/newman09a.pdf Distributed Algorithms for Topic Models], Newman et al, JMLR 2009.
-* [http://www.ics.uci.edu/~newman/pubs/fastlda.pdf Fast Collapsed Gibbs Sampling for LDA], Porteous et al, ...
 * [http://people.cs.umass.edu/~mimno/papers/fast-topic-model.pdf Efficient Methods for Topic Model Inference on Streaming Document Collections], Yao, Mimno, McCallum KDD 2009.
+* [http://dl.acm.org/citation.cfm?id=2623756 Reducing the sampling complexity of topic models], Li, Ahmed, Ravi, & Smola, KDD 2014
+* [https://dl.acm.org/citation.cfm?id=2741682 A Scalable Asynchronous Distributed Algorithm for Topic Modeling], Yu, Hsieh, Yun, Vishwanathan, Dillon, WWW 2015
+=== Things to remember ===
+* How Gibbs sampling is used to sample from a model.
+* The "generative story" associated with key models like LDA, naive Bayes, and stochastic block models.
+* What a "mixed membership" generative model is.
+* The time complexity and storage requirements of Gibbs sampling for LDAs.
+* How LDA learning can be sped up using IPM approaches.
+* Why efficient sampling is important for LDAs
+* How sampling can be sped up for many topics by preprocessing the parameters of the distribution
+* How the storage used for LDA can be reduced by exploiting the fact that many words are rare.

Difference between revisions of "Class meeting for 10-605 LDA"

Latest revision as of 10:31, 20 November 2017

Contents

Slides

Quiz

Readings

Things to remember

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools