Difference between revisions of "Class meeting for 10-605 Randomized"
From Cohen Courses
Jump to navigationJump to search(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]]. |
=== Slides === | === Slides === | ||
− | + | * TBD | |
− | |||
− | * | ||
− | |||
Supplement: | Supplement: | ||
Line 21: | Line 18: | ||
=== Key things to remember === | === Key things to remember === | ||
− | * The API for the randomized methods we studied: Bloom filters, LSH, CM sketches | + | * The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique. |
− | * What are the key tradeoffs associated with these methods, in terms of space/time efficiency. | + | * The relationship between hash kernels and CM sketches. |
+ | * What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc). | ||
* What guarantees are possible, and how space grows as you require more accuracy. | * What guarantees are possible, and how space grows as you require more accuracy. | ||
* Which algorithms allow one to combine sketches easily. | * Which algorithms allow one to combine sketches easily. |
Latest revision as of 15:28, 11 August 2016
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.
Slides
- TBD
Supplement:
Optional Readings
- Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering Deepak Ravichandran, Patrick Pantel, and Eduard Hovy
- Online Generation of Locality Sensitive Hash Signatures. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010
- Sketch Algorithms for Estimating Point Queries in NLP. Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
Key things to remember
- The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
- The relationship between hash kernels and CM sketches.
- What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
- What guarantees are possible, and how space grows as you require more accuracy.
- Which algorithms allow one to combine sketches easily.