Difference between revisions of "Class meeting for 10-605 Randomized"

Latest revision as of 15:28, 11 August 2016

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.

Slides

TBD

Supplement:

Python demo code for Bloom filter

Optional Readings

Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering Deepak Ravichandran, Patrick Pantel, and Eduard Hovy
Online Generation of Locality Sensitive Hash Signatures. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010
Sketch Algorithms for Estimating Point Queries in NLP. Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]

Key things to remember

The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
The relationship between hash kernels and CM sketches.
What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
What guarantees are possible, and how space grows as you require more accuracy.
Which algorithms allow one to combine sketches easily.

@@ Line 1: / Line 1: @@
-This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2015]].
+This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].
 === Slides ===
-Comment: I'm going to start off with a few slides related to the upcoming assignment on MF with Spark.
+* TBD
-* [http://www.cs.cmu.edu/~wcohen/10-605/spark-for-mf.pptx Spark for MF in PowerPoint], [http://www.cs.cmu.edu/~wcohen/10-605/spark-for-mf.pdf Spark for MF in PDF].
-* [http://www.cs.cmu.edu/~wcohen/10-605/randomized-algs.pptx Randomized Algorithms in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/randomized-algs.pdf in PDF]
 Supplement:
@@ Line 21: / Line 18: @@
 === Key things to remember ===
-* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches
+* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
-* What are the key tradeoffs associated with these methods, in terms of space/time efficiency.
+* The relationship between hash kernels and CM sketches.
+* What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
 * What guarantees are possible, and how space grows as you require more accuracy.
 * Which algorithms allow one to combine sketches easily.

Difference between revisions of "Class meeting for 10-605 Randomized"

Latest revision as of 15:28, 11 August 2016

Slides

Optional Readings

Key things to remember

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools