Difference between revisions of "Class meeting for 10-605 Randomized"

From Cohen Courses
Jump to navigationJump to search
 
Line 4: Line 4:
 
=== Slides ===
 
=== Slides ===
  
Comment: I'm going to start off with a few slides related to the upcoming assignment on MF with Spark.
+
* TBD
 
 
* [http://www.cs.cmu.edu/~wcohen/10-605/spark-for-mf.pptx Spark for MF in PowerPoint], [http://www.cs.cmu.edu/~wcohen/10-605/spark-for-mf.pdf Spark for MF in PDF].
 
* [http://www.cs.cmu.edu/~wcohen/10-605/randomized-algs.pptx Randomized Algorithms in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/randomized-algs.pdf in PDF]
 
  
 
Supplement:
 
Supplement:

Latest revision as of 16:28, 11 August 2016

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.


Slides

  • TBD

Supplement:

Optional Readings

Key things to remember

  • The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
  • The relationship between hash kernels and CM sketches.
  • What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
  • What guarantees are possible, and how space grows as you require more accuracy.
  • Which algorithms allow one to combine sketches easily.