Class meeting for 10-405 Randomized Algorithms
From Cohen Courses
Revision as of 13:35, 2 April 2018 by Wcohen
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.
- William's lecture notes on randomized algorithms (covering Bloom filters and countmin sketches).
- Online Generation of Locality Sensitive Hash Signatures. Benjamin Van Durme and Ashwin Lall. ACL Short. 2010
- Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering Deepak Ravichandran, Patrick Pantel, and Eduard Hovy
- Sketch Algorithms for Estimating Point Queries in NLP. Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
- Short and Deep: Sketching and Neural Networks: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar, ICLR 2017
- Lin, Jie, et al. "DeepHash for Image Instance Retrieval: Getting Regularization, Depth and Fine-Tuning Right." Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 2017.
Key things to remember
- The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and LSH.
- The benefits of the online LSH method.
- The key algorithmic ideas behind these methods: random projections, hashing and allowing collisions, controlling probability of collisions with multiple hashes, and use of pooling to avoid storing many randomly-created objects.
- When you would use which technique.
- The relationship between hash kernels and CM sketches.
- What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
- What guarantees are possible, and how space grows as you require more accuracy.
- Which algorithms allow one to combine sketches easily (i.e., when are the sketches additive).