Class meeting for 10-605 Randomized Algorithms

The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
The relationship between hash kernels and CM sketches.
What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
What guarantees are possible, and how space grows as you require more accuracy.
Which algorithms allow one to combine sketches easily.

How to implement graph algorithms like PageRank by streaming through a graph, under various conditions:
- Vertex weights fit in memory
- Vertex weights do not fit in memory
The meaning of various graph statistics: degree distribution, clustering coefficient, ...
Why sampling from a graph is non-trivial if you want to preserve properties of the graph like
- Degree distribution
- Homophily as measured by clustering coefficient,
What local graph partitioning is and how the PageRank-Nibble algorithm, together with sweeps to optimize conductance, can be used to approximately solve it.
The implications of the analysis of PageRank-Nibble.

Contents