Difference between revisions of "Class meeting for 10-605 Randomized Algorithms"

From Cohen Courses
Jump to navigationJump to search
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2017]].
  
 
=== Slides ===
 
=== Slides ===
  
* TBD
+
* Lecture 1 [http://www.cs.cmu.edu/~wcohen/10-605/randomized-1.pptx Powerpoint],  [http://www.cs.cmu.edu/~wcohen/10-605/randomized-1.pdf PDF].
 +
*  Lecture 2 [http://www.cs.cmu.edu/~wcohen/10-605/randomized-2.pptx Powerpoint],  [http://www.cs.cmu.edu/~wcohen/10-605/randomized-2.pdf PDF].
  
 +
=== Quizzes ===
  
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].
+
* [https://qna.cs.cmu.edu/#/pages/view/83 Quiz for lecture 1]
 +
* [https://qna.cs.cmu.edu/#/pages/view/217 Quiz for lecture 2]
  
 +
=== Sample Code ===
  
=== Slides ===
+
* [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter]
  
* TBD
+
=== Readings ===
  
Supplement:
+
* William's [http://www.cs.cmu.edu/~wcohen/10-605/notes/randomized-algs.pdf lecture notes on randomized algorithms].
 
 
* [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter]
 
  
 
=== Optional Readings ===
 
=== Optional Readings ===
Line 22: Line 24:
 
* [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall.  ACL Short. 2010
 
* [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall.  ACL Short. 2010
 
* [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.]  Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
 
* [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.]  Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
 +
 +
=== Also discussed ===
 +
 +
* [https://openreview.net/forum?id=r1br_2Kge Short and Deep: Sketching and Neural Networks: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar, ICLR 2017]
 +
* [https://dl.acm.org/citation.cfm?id=3078983 Lin, Jie, et al. "DeepHash for Image Instance Retrieval: Getting Regularization, Depth and Fine-Tuning Right." Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 2017.]
  
 
=== Key things to remember ===
 
=== Key things to remember ===
  
* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
+
* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and LSH.
 +
* The benefits of the online LSH method.
 +
* The key algorithmic ideas behind these methods: random projections, hashing and allowing collisions, controlling probability of collisions with multiple hashes, and use of pooling to avoid storing many randomly-created objects.
 +
* When you would use which technique.
 
* The relationship between hash kernels and CM sketches.
 
* The relationship between hash kernels and CM sketches.
 
* What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
 
* What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
 
* What guarantees are possible, and how space grows as you require more accuracy.
 
* What guarantees are possible, and how space grows as you require more accuracy.
* Which algorithms allow one to combine sketches easily.
+
* Which algorithms allow one to combine sketches easily (i.e., when are the sketches additive).
 
 
=== Key things to remember ===
 
 
 
* How to implement graph algorithms like PageRank by streaming through a graph, under various conditions:
 
** Vertex weights fit in memory
 
** Vertex weights do not fit in memory
 
* The meaning of various graph statistics: degree distribution, clustering coefficient, ...
 
* Why sampling from a graph is non-trivial if you want to preserve properties of the graph like
 
** Degree distribution
 
** Homophily as measured by clustering coefficient,
 
* What local graph partitioning is and how the PageRank-Nibble algorithm, together with sweeps to optimize conductance, can be used to approximately solve it.
 
* The implications of the analysis of PageRank-Nibble.
 

Latest revision as of 11:34, 28 November 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2017.

Slides

Quizzes

Sample Code

Readings

Optional Readings

Also discussed

Key things to remember

  • The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and LSH.
  • The benefits of the online LSH method.
  • The key algorithmic ideas behind these methods: random projections, hashing and allowing collisions, controlling probability of collisions with multiple hashes, and use of pooling to avoid storing many randomly-created objects.
  • When you would use which technique.
  • The relationship between hash kernels and CM sketches.
  • What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
  • What guarantees are possible, and how space grows as you require more accuracy.
  • Which algorithms allow one to combine sketches easily (i.e., when are the sketches additive).