Difference between revisions of "Class meeting for 10-605 Randomized Algorithms"

From Cohen Courses
Jump to navigationJump to search
 
Line 14: Line 14:
  
 
* [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter]
 +
 +
=== Readings ===
 +
 +
* William's [http://www.cs.cmu.edu/~wcohen/10-605/notes/randomized-algs.pdf lecture notes on randomized algorithms].
  
 
=== Optional Readings ===
 
=== Optional Readings ===

Latest revision as of 11:34, 28 November 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2017.

Slides

Quizzes

Sample Code

Readings

Optional Readings

Also discussed

Key things to remember

  • The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and LSH.
  • The benefits of the online LSH method.
  • The key algorithmic ideas behind these methods: random projections, hashing and allowing collisions, controlling probability of collisions with multiple hashes, and use of pooling to avoid storing many randomly-created objects.
  • When you would use which technique.
  • The relationship between hash kernels and CM sketches.
  • What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
  • What guarantees are possible, and how space grows as you require more accuracy.
  • Which algorithms allow one to combine sketches easily (i.e., when are the sketches additive).