Difference between revisions of "Class meeting for 10-605 2013 LDA 2"
From Cohen Courses
Jump to navigationJump to search(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2015]]. |
=== Slides === | === Slides === | ||
− | * [http://www.cs.cmu.edu/~wcohen/10-605/ | + | * [http://www.cs.cmu.edu/~wcohen/10-605/topic-models-2.ppt Scaling up LDA 2/2],[http://www.cs.cmu.edu/~wcohen/10-605/topic-models-2.pdf as PDF] |
=== Readings === | === Readings === | ||
− | |||
* [http://people.cs.umass.edu/~mimno/papers/fast-topic-model.pdf Efficient Methods for Topic Model Inference on Streaming Document Collections], Yao, Mimno, McCallum KDD 2009. | * [http://people.cs.umass.edu/~mimno/papers/fast-topic-model.pdf Efficient Methods for Topic Model Inference on Streaming Document Collections], Yao, Mimno, McCallum KDD 2009. | ||
* [http://dl.acm.org/citation.cfm?id=2623756 Reducing the sampling complexity of topic models], Li, Ahmed, Ravi, & Smola, KDD 2014 | * [http://dl.acm.org/citation.cfm?id=2623756 Reducing the sampling complexity of topic models], Li, Ahmed, Ravi, & Smola, KDD 2014 | ||
* [http://arxiv.org/abs/1412.1576 LightLDA: Big Topic Models on Modest Compute Clusters], Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric P. Xing, Tie-Yan Liu, Wei-Ying Ma, 2015 | * [http://arxiv.org/abs/1412.1576 LightLDA: Big Topic Models on Modest Compute Clusters], Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric P. Xing, Tie-Yan Liu, Wei-Ying Ma, 2015 | ||
+ | |||
+ | === Things to Remember === | ||
+ | |||
+ | * Why efficient sampling is important for LDAs | ||
+ | * How sampling can be sped up for many topics by preprocessing the parameters of the distribution | ||
+ | * How the storage used for LDA can be reduced by exploiting the fact that many words are rare. |
Latest revision as of 17:57, 4 December 2015
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2015.
Slides
Readings
- Efficient Methods for Topic Model Inference on Streaming Document Collections, Yao, Mimno, McCallum KDD 2009.
- Reducing the sampling complexity of topic models, Li, Ahmed, Ravi, & Smola, KDD 2014
- LightLDA: Big Topic Models on Modest Compute Clusters, Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric P. Xing, Tie-Yan Liu, Wei-Ying Ma, 2015
Things to Remember
- Why efficient sampling is important for LDAs
- How sampling can be sped up for many topics by preprocessing the parameters of the distribution
- How the storage used for LDA can be reduced by exploiting the fact that many words are rare.