Difference between revisions of "Class meeting for 10-605 Randomized"

From Cohen Courses
Jump to navigationJump to search
Line 18: Line 18:
 
* [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall.  ACL Short. 2010
 
* [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall.  ACL Short. 2010
 
* [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.]  Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
 
* [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.]  Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
 +
 +
=== Key things to remember ===
 +
 +
* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches
 +
* What are the key tradeoffs associated with these methods, in terms of space/time efficiency.
 +
* What guarantees are possible, and how space grows as you require more accuracy.

Revision as of 17:46, 4 December 2015

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2015.


Slides

Comment: I'm going to start off with a few slides related to the upcoming assignment on MF with Spark.

Supplement:

Optional Readings

Key things to remember

  • The API for the randomized methods we studied: Bloom filters, LSH, CM sketches
  • What are the key tradeoffs associated with these methods, in terms of space/time efficiency.
  • What guarantees are possible, and how space grows as you require more accuracy.