Difference between revisions of "Class meeting for 10-405 Randomized Algorithms"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...")
(No difference)

Revision as of 15:21, 15 January 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Slides

Quizzes

Sample Code

Readings

Optional Readings

Also discussed

Key things to remember

  • The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and LSH.
  • The benefits of the online LSH method.
  • The key algorithmic ideas behind these methods: random projections, hashing and allowing collisions, controlling probability of collisions with multiple hashes, and use of pooling to avoid storing many randomly-created objects.
  • When you would use which technique.
  • The relationship between hash kernels and CM sketches.
  • What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
  • What guarantees are possible, and how space grows as you require more accuracy.
  • Which algorithms allow one to combine sketches easily (i.e., when are the sketches additive).