Difference between revisions of "Class meeting for 10-605 Randomized Algorithms"
From Cohen Courses
Jump to navigationJump to search (Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Datase...") |
|||
(24 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2017]]. |
=== Slides === | === Slides === | ||
− | * | + | * Lecture 1 [http://www.cs.cmu.edu/~wcohen/10-605/randomized-1.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/randomized-1.pdf PDF]. |
+ | * Lecture 2 [http://www.cs.cmu.edu/~wcohen/10-605/randomized-2.pptx Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/randomized-2.pdf PDF]. | ||
+ | === Quizzes === | ||
+ | |||
+ | * [https://qna.cs.cmu.edu/#/pages/view/83 Quiz for lecture 1] | ||
+ | * [https://qna.cs.cmu.edu/#/pages/view/217 Quiz for lecture 2] | ||
+ | |||
+ | === Sample Code === | ||
+ | |||
+ | * [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter] | ||
=== Readings === | === Readings === | ||
− | * [http://dl.acm.org/citation.cfm?id= | + | * William's [http://www.cs.cmu.edu/~wcohen/10-605/notes/randomized-algs.pdf lecture notes on randomized algorithms]. |
− | * [http://www. | + | |
− | * [ | + | === Optional Readings === |
+ | |||
+ | * [http://dl.acm.org/citation.cfm?id=1219840.1219917 Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering] Deepak Ravichandran, Patrick Pantel, and Eduard Hovy | ||
+ | * [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010 | ||
+ | * [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.] Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012] | ||
+ | |||
+ | === Also discussed === | ||
+ | |||
+ | * [https://openreview.net/forum?id=r1br_2Kge Short and Deep: Sketching and Neural Networks: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar, ICLR 2017] | ||
+ | * [https://dl.acm.org/citation.cfm?id=3078983 Lin, Jie, et al. "DeepHash for Image Instance Retrieval: Getting Regularization, Depth and Fine-Tuning Right." Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 2017.] | ||
=== Key things to remember === | === Key things to remember === | ||
− | * | + | * The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and LSH. |
− | * | + | * The benefits of the online LSH method. |
− | + | * The key algorithmic ideas behind these methods: random projections, hashing and allowing collisions, controlling probability of collisions with multiple hashes, and use of pooling to avoid storing many randomly-created objects. | |
− | * The | + | * When you would use which technique. |
− | * | + | * The relationship between hash kernels and CM sketches. |
− | * | + | * What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc). |
− | * | + | * What guarantees are possible, and how space grows as you require more accuracy. |
− | * What | + | * Which algorithms allow one to combine sketches easily (i.e., when are the sketches additive). |
− | * |
Latest revision as of 11:34, 28 November 2017
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2017.
Contents
Slides
- Lecture 1 Powerpoint, PDF.
- Lecture 2 Powerpoint, PDF.
Quizzes
Sample Code
Readings
- William's lecture notes on randomized algorithms.
Optional Readings
- Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering Deepak Ravichandran, Patrick Pantel, and Eduard Hovy
- Online Generation of Locality Sensitive Hash Signatures. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010
- Sketch Algorithms for Estimating Point Queries in NLP. Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
Also discussed
- Short and Deep: Sketching and Neural Networks: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar, ICLR 2017
- Lin, Jie, et al. "DeepHash for Image Instance Retrieval: Getting Regularization, Depth and Fine-Tuning Right." Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 2017.
Key things to remember
- The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and LSH.
- The benefits of the online LSH method.
- The key algorithmic ideas behind these methods: random projections, hashing and allowing collisions, controlling probability of collisions with multiple hashes, and use of pooling to avoid storing many randomly-created objects.
- When you would use which technique.
- The relationship between hash kernels and CM sketches.
- What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
- What guarantees are possible, and how space grows as you require more accuracy.
- Which algorithms allow one to combine sketches easily (i.e., when are the sketches additive).