Difference between revisions of "Class meeting for 10-605 Randomized"

Revision as of 18:20, 6 December 2015

Comment: I'm going to start off with a few slides related to the upcoming assignment on MF with Spark.

Supplement:

The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
The relationship between hash kernels and CM sketches.
What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
What guarantees are possible, and how space grows as you require more accuracy.
Which algorithms allow one to combine sketches easily.

@@ Line 21: / Line 21: @@
 === Key things to remember ===
-* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches
+* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
-* What are the key tradeoffs associated with these methods, in terms of space/time efficiency.
+* The relationship between hash kernels and CM sketches.
+* What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
 * What guarantees are possible, and how space grows as you require more accuracy.
 * Which algorithms allow one to combine sketches easily.