Difference between revisions of "Class meeting for 10-605 Randomized"

From Cohen Courses
Jump to navigationJump to search
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Spring_2015]].
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].
  
  
 
=== Slides ===
 
=== Slides ===
  
* [http://www.cs.cmu.edu/~wcohen/10-605/randomized-algs.pptx Slides in Powerpoint]
+
* TBD
  
 +
Supplement:
  
 
* [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter]
Line 11: Line 12:
 
=== Optional Readings ===
 
=== Optional Readings ===
  
 +
* [http://dl.acm.org/citation.cfm?id=1219840.1219917 Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering] Deepak Ravichandran, Patrick Pantel, and Eduard Hovy
 
* [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall.  ACL Short. 2010
 
* [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall.  ACL Short. 2010
 
* [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.]  Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
 
* [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.]  Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
 +
 +
=== Key things to remember ===
 +
 +
* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
 +
* The relationship between hash kernels and CM sketches.
 +
* What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
 +
* What guarantees are possible, and how space grows as you require more accuracy.
 +
* Which algorithms allow one to combine sketches easily.

Latest revision as of 15:28, 11 August 2016

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.


Slides

  • TBD

Supplement:

Optional Readings

Key things to remember

  • The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
  • The relationship between hash kernels and CM sketches.
  • What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
  • What guarantees are possible, and how space grows as you require more accuracy.
  • Which algorithms allow one to combine sketches easily.