Class meeting for 10-605 LDA 2
From Cohen Courses
Jump to navigationJump to search
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2016.
Slides
- TBD
Readings
- Efficient Methods for Topic Model Inference on Streaming Document Collections, Yao, Mimno, McCallum KDD 2009.
- Reducing the sampling complexity of topic models, Li, Ahmed, Ravi, & Smola, KDD 2014
- LightLDA: Big Topic Models on Modest Compute Clusters, Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric P. Xing, Tie-Yan Liu, Wei-Ying Ma, 2015
Things to Remember
- Why efficient sampling is important for LDAs
- How sampling can be sped up for many topics by preprocessing the parameters of the distribution
- How the storage used for LDA can be reduced by exploiting the fact that many words are rare.