Difference between revisions of "Class meeting for 10-605 Randomized Algorithms"
From Cohen Courses
Jump to navigationJump to search(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2017]]. |
=== Slides === | === Slides === | ||
− | * Lecture 1 [http://www.cs.cmu.edu/~wcohen/10-605/randomized-1.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/randomized-1.pdf PDF]. | + | * Lecture 1 [http://www.cs.cmu.edu/~wcohen/10-605/randomized-1.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/randomized-1.pdf PDF]. |
* Lecture 2 [http://www.cs.cmu.edu/~wcohen/10-605/randomized-2.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/randomized-2.pdf PDF]. | * Lecture 2 [http://www.cs.cmu.edu/~wcohen/10-605/randomized-2.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/randomized-2.pdf PDF]. | ||
Line 9: | Line 9: | ||
* [https://qna.cs.cmu.edu/#/pages/view/83 Quiz for lecture 1] | * [https://qna.cs.cmu.edu/#/pages/view/83 Quiz for lecture 1] | ||
+ | * [https://qna.cs.cmu.edu/#/pages/view/217 Quiz for lecture 2] | ||
=== Sample Code === | === Sample Code === | ||
* [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter] | * [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter] | ||
+ | |||
+ | === Readings === | ||
+ | |||
+ | * William's [http://www.cs.cmu.edu/~wcohen/10-605/notes/randomized-algs.pdf lecture notes on randomized algorithms]. | ||
=== Optional Readings === | === Optional Readings === | ||
Line 19: | Line 24: | ||
* [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010 | * [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010 | ||
* [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.] Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012] | * [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.] Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012] | ||
+ | |||
+ | === Also discussed === | ||
+ | |||
+ | * [https://openreview.net/forum?id=r1br_2Kge Short and Deep: Sketching and Neural Networks: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar, ICLR 2017] | ||
+ | * [https://dl.acm.org/citation.cfm?id=3078983 Lin, Jie, et al. "DeepHash for Image Instance Retrieval: Getting Regularization, Depth and Fine-Tuning Right." Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 2017.] | ||
=== Key things to remember === | === Key things to remember === |
Latest revision as of 11:34, 28 November 2017
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2017.
Contents
Slides
- Lecture 1 Powerpoint, PDF.
- Lecture 2 Powerpoint, PDF.
Quizzes
Sample Code
Readings
- William's lecture notes on randomized algorithms.
Optional Readings
- Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering Deepak Ravichandran, Patrick Pantel, and Eduard Hovy
- Online Generation of Locality Sensitive Hash Signatures. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010
- Sketch Algorithms for Estimating Point Queries in NLP. Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
Also discussed
- Short and Deep: Sketching and Neural Networks: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar, ICLR 2017
- Lin, Jie, et al. "DeepHash for Image Instance Retrieval: Getting Regularization, Depth and Fine-Tuning Right." Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 2017.
Key things to remember
- The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and LSH.
- The benefits of the online LSH method.
- The key algorithmic ideas behind these methods: random projections, hashing and allowing collisions, controlling probability of collisions with multiple hashes, and use of pooling to avoid storing many randomly-created objects.
- When you would use which technique.
- The relationship between hash kernels and CM sketches.
- What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
- What guarantees are possible, and how space grows as you require more accuracy.
- Which algorithms allow one to combine sketches easily (i.e., when are the sketches additive).